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SOL  81-21:  Polynomial  Local  Improvement  Algorithms  In  Combinatorial 

Optimization,  Craig  A.  Tovey 

Local  improvement  algorithms  are  widely  used  to  solve  a  variety  of  problems, 
including  linear  programming,  integer  programming,  and  linear  complementar¬ 
ity.  One  reason  for  the  popularity  of  local  improvement  methods  is  that 
they  tend  to  run  quickly  for  problems  encountered  in  practice.  On  the  other 
hand,  for  some  particular  optimization  problems,  worst  case  examples  have 
been  constructed  whose  performance  grows  exponentially  with  the  size  of  the 
problem.  It  is  natural  to  ask  whether  these  are  characteristics  of  local 
improvement  algorithms  in  general,  and  to  seek  a  more  exact  description  of 
the  performance  of  these  algorithms. 

The  subject  of  this  report  is  an  analysis  of  the  expected,  or  average  case 
performance  of  local  improvement  algorithms.  The  first  chapter  presents  the 
basic  model,  defines  the  combinatorial  structures  which  are  the  basis  for 
the  analysis,  and  describes  the  randomness  assumptions  upon  which  the 
expectation  are  based.  The  second  chapter  examines  these  structures  in  more 
detail,  including  an  analysis  of  both  best  and  worst  case  performance.  The 
third  chapter  discusses  simulation  results  which  predict  an  approximately 
linear  average  case  performance,  and  proves  an  0(nf  log  n)  upper  bound  for 
two  of  the  random  distributions  assumed.  Chapter  Four  proves^  some 
extensions  and  sharper  versions  of  this  upper  bound.  The  fifth  chapter 
applies  the  model  to  principal  pivoting  algorithms  for  the  linear 
complementarity  problem,  and  to  the  simplex  method.  Althou9^ 
improvement  is  not  guaranteed  to  find  a  global  optimum  for  all  problems, 
most  notably  those  that  are  NP-complete,  It  is  nonetheless  often  used  in 
these  cases.  Chapter  Six  discusses  these  applications.  ^ 
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CHAPTER  1. 

A  MODEL  OF  LOCAL  IMPROVEMENT  ALGORITHMS 


1.1  Introduction 


Suppose  you  are  at  the  base  of  a  mountain  in  the  middle  of  the  night,  and  you 
want  to  climb  to  the  top.  The  moon  is  dim,  so  all  you  have  to  to  light  the  way  is  a 
flashlight,  which  lets  you  see  the  ground  within  a  ten  foot  radius.  What  do  you  do? 
You  sweep  the  area  around  you  with  the  flashlight,  find  a  nearby  spot  that  is  a  little 
higher  than  where  you  are  standing,  and  move  there.  Then  you  repeat  this  process, 
always  increasing  your  elevation,  until  you  reach  a  peak —  a  point  higher  than  the 
area  around  it.  If  the  mountain  has  only  one  peak,  you  are  guaranteed  to  get  to 
the  top.  Such  a  method  is  called  a  hill  climbing  or  local  improvement  algorithm. 

Hill  climbing  methods  are  used  in  algorithms  for  linear  programming,  linear 
complementarity,  artificial  intelligence  applications,  and  many  other  combinatorial 
optimization  problems.  Most  of  the  analysis  of  these  methods  has  dealt  with  the 
worst  case  performance  of  a  particular  algorithm.  More  recently,  there  has  been 
some  work  concerning  the  average  case  performance  of  the  Simplex  Method  (Ross, 
1981)  which,  however,  fails  to  take  into  account  the  combinatorial  structure  of  the 
problem.  We  present  a  general  model  which  overcomes  this  difficulty  and  which  can 
be  applied  to  a  variety  of  local  improvement  algorithms. 

In  an  optimization  problem,  the  horizontal  cross  section  of  the  mountain  cor¬ 
responds  to  the  domain,  or  search  space;  the  height  of  a  spot  of  ground  cor¬ 
responds  to  the  value  of  the  objective  function  we  wish  to  maximize  over  the  domain 
Suppose,  then,  we  have  a  real  valued  function  /  whose  domain  is  the  set  of  vertices 
of  the  n-cube  (i.e.,  the  set  of  n-tuples  of  zeroes  and  ones),  and  that  we  wish  to 
maximize  /(x)  over  this  space.  We  can  assume  that  all  the  values  of  /  are  distinct, 
for  if  f(x)  =  f(y)  we  say  /(x)  >  /(y)  if  x  is  lexicographically  greater  (see  Dantsig, 
1963)  than  y.  For  example,  if  /(0110)  =  /(0101)  we  say  that  /(0110)  is  the  larger. 
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The  domain  of  the  function  can  be  thought  of  as  a  set  of  boolean  decision  variables: 
many  optimization  problems  may  be  cast  in  this  form  (see  Cook,  1971).  If  /  is  to 
be  minimized,  we  maximize  — /.  In  some  applications  such  as  the  simplex  method, 
not  all  vertices  of  the  hypercube  correspond  to  feasible  points.  If  a  vertex  z  is  not 
feasible,  we  apply  penalties  for  constraint  violation  to  the  value  of  /(z). 

There  is  a  natural  notion  of  distance  between  two  vertices  of  the  n-cube:  the 
number  of  components  in  which  they  differ.  This  distance  is  a  metric  and  is  known 
as  the  Hamming  distance  (it  equals  the  square  of  the  Euclidean  norm).  If  z  and 
y  are  at  a  distance  of  zero,  then  x  =  y;  if  z  and  y  are  at  a  distance  of  one,  they 
share  an  edge  and  are  said  to  be  adjacent  or  neighbors.  A  vertex  whose  function 
value  is  greater  than  any  of  its  n  neighbors  is  called  a  local  maximum.  If  /  has  the 
property  that  a  local  maximum  is  a  global  maximum  we  say  that  /  is  Local- Global 
or  LG  for  short.  The  LG  property  is  of  course  reminiscent  of  the  property  that  a 
local  minimum  of  a  convex  function  is  a  global  one;  see  also  (Dearing,  1976). 

If  /  is  LG,  a  local  improvement  algorithm  will  solve  the  problem  of  maximizing 
/  over  the  stated  domain.  In  particular,  we  define  the  Optimal  Adjacency  (OA) 
algorithm  as  follows: 

Optimal  Adjacency  Algorithm 

0.  Start  with  any  vertex  x. 

1.  If  x  is  locally  optimal,  stop  with  x  the  solution.  Otherwise  proceed  to  2. 

2.  Let  y  be  the  optimal  vertex  adjacent  to  x.  Set  x  equal  to  y  and  go  to  1. 

Since  the  domain  is  finite  and  has  only  one  local  optimum,  the  algorithm  must 
terminate  after  finitely  many  steps  with  the  correct  solution. 

Note  that  every  vertex  has  only  n  neighbors  and  that  the  diameter  of  the  space 
(the  maximum  possible  Hamming  distance  between  any  two  vertices)  is  also  n.  A 
single  iteration  of  the  OA  algorithm  requires  at  most  n  function  evaluations,  and 
it  is  quite  possible  for  the  number  of  iterations  to  be  a  low  order  polynomial.  The 
next  section  deals  with  the  problem  of  how  many  iterations  the  algorithm  can  be 
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expected  to  take  for  a  particular  instance  of  an  LG  problem. 


1.2  OATs. 


If  we  are  given  a  particular  LG  function  /,  we  can  construct  a  directed  tree  to 
show  how  many  iterations  the  optimal  adjacency  algorithm  will  require: 

i)  Each  vertex  of  the  n-cube  corresponds  to  a  node  of  the  tree. 

ii)  The  father  of  a  vertex  is  its  optimal  adjacent  vertex;  if  a  vertex  is  a  local 
optimum,  it  has  no  father. 


The  tree  is  called  an  Optimal  Adjacency  Tree,  or  OAT.  Its  root  is  the  local 
(hence  globally  unique)  optimum.  The  OAT  displays  the  path  followed  by  the  algo¬ 
rithm  by  going  from  son  to  father  on  the  tree  (a  biologically  backwards  progression.) 
It  should  be  emphasized  that  for  any  instance  of  any  local-global  problem  there  is 
a  unique  OAT  which  describes  the  action  of  the  OA  algorithm  on  that  instance. 
Figure  1.1  shows  four  possible  OATs  for  n=2, 


10  01 
'll 


Figure  1.1 


and  Figure  1.2  shows  a  possible  OAT  for  n  =  3. 
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Figure  1.2 


The  set  of  OATs  of  order  n  has  a  2n-fold  symmetry.  Any  of  the  2n  vertices 
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Suppose  that  we  were  going  to  run  the  OA  algorithm  on  a  particular  instance 
of  a  problem,  and  we  knew  that  this  instance  had  the  structure  shown  in  Figure  1.2. 
How  many  iterations  would  we  expect  the  algorithm  to  take?  If  the  starting  vertex 
is  chosen  at  random,  there  would  be  an  equal  probability  of  starting  at  each  of  the 
8  vertices.  In  general  for  each  starting  vertex,  the  path  to  the  root  in  the  OAT  is 
by  definition  the  path  the  optimal  adjacency  algorithm  will  follow.  Thus  the  height 
or  pathlength  of  each  vertex  in  the  tree  is  the  number  of  iterations  the  algorithm 
would  need  to  reach  the  optimum  from  that  vertex.  The  mean  pathlength  of  the 
tree  (the  mean  of  the  pathlengths  of  all  the  nodes  in  the  tree)  is  precisely  equal  to 
the  expected  number  of  iterations  of  the  OA  algorithm.  Thus  for  a  problem  which 
has  the  structure  of  the  tree  in  Figure  1.2,  the  OA  algorithm  would  be  expected 
to  take  (1X1+3X2  +  3X3  +  1X  4)/8  =  2g  iterations.  As  another  example, 
the  expected  number  of  iterations  required  for  any  LG  problem  with  n  =  2  is  two, 
because  all  OATs  for  n  =  2  have  a  mean  pathlength  of  two. 


If  /  is  not  local-global,  the  rules  for  producing  the  OAT  will  instead  produce 
an  OAF,  or  Optimal  Adjacency  Forest,  with  one  tree  per  local  optimum. 


There  exist  classes  of  OATs  which  are  exponentially  high,  which  implies  that 
the  mean  pathlength  is  also.  (Note:  the  height  of  a  tree  is  the  maximum  height  of 
the  vertices  in  the  tree.  A  class  of  trees  is  exponentially  high  if  the  heights  of  the 
trees  in  the  class  grow  exponentially  in  n.)  The  study  of  worst  case  OATs  is  closely 
related  to  the  study  of  snakes-in-boxes;  results  from  the  latter  can  be  applied  to  the 
former  with  little  change.  In  the  next  chapter  we  show  that  a  lower  bound  on  the 
maximum  height  of  an  OAT  of  order  n  is  given  by 


7X2" 

4(n  —  1) 


-  1 


for  n  >  6.  This  comes  from  a  lower  bound  on  snakes-in-boxes  due  to  Victor  Klee 
(Klee,  1970). 


Since  there  exist  OATs  that  are  exponentially  high,  any  strict  bound  on  the 
performance  of  the  OA  algorithm  must  be  exponential  in  n.  If  the  worst  case  OAT 
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is  in  some  sense  rarely  encountered,  it  is  of  interest  to  ask,  what  is  the  expected 
performance  of  the  optimal  adjacency  algorithm?  Or,  equivalently,  what  is  the 
expected  mean  pathlength  of  an  OAT  of  order  n?  This  question  is  of  course  not 
well  defined  without  a  notion  of  an  underlying  probability  distribution  of  OATs. 
There  is  strong  empirical  evidence  that  under  a  variety  of  underlying  distributions, 
the  expected  mean  pathlength  of  an  OAT  is  linear  in  n.  The  next  section  defines 
the  necessary  terminology  for  describing  these  underlying  probability  distributions. 

1.3  LG  Orderings  and  the  Boundary  • 

When  we  construct  an  OAT  from  the  function  /,  we  are  not  interested  in  the 
specific  numeric  values  of  /,  but  in  the  ordering  of  values  of  /  on  the  vertices.  Since 
functional  values  are  distinct,  the  vertices  can  be  uniquely  ordered  from  high  to  low 
function  value.  Such  a  list  of  vertices  is  called  an  ordering.  For  our  purposes,  the 
ordering  of  the  vertices  defines  /.  If  /  were  as  in  Example  A, 


1  • 
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101 

111 

110 
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then  the  rules  for  OATs  would  produce  the  OAT  we  looked  at  in  Figure  1.2.  In  this 
case  we  say  the  ordering  produces  the  OAT. 

Not  every  ordering  produces  an  OAT;  many  produce  OAFs  instead.  For  instance, 
Example  B, 

00 
11 
01 
10, 

5 


S:', 


Example  B 


is  an  ordering  for  n  =  2  which  produces  the  OAF 


^  n 

1G  01 

because  00  and  11  are  not  adjacent. 

If  an  ordering  produces  an  OAT  it  is  said  to  be  an  LG  ordering.  An  obvious 
necessary  and  sufficient  condition  that  an  ordering  be  LG  is  that  every  vertex  except 
the  first  have  an  adjacent  vertex  that  is  located  higher  up  in  the  ordering.  Example 
B  fails  to  be  LG  because  all  of  the  neighbors  of  the  vertex  11  are  located  below  it 
in  the  ordering. 

Each  LG  ordering  produces  only  one  OAT,  but  given  an  OAT,  there  can  exist 
many  orderings  which  produce  it.  For  example,  if  the  bottom  two  vertices  in 
Example  A  traded  places  in  the  ordering,  the  resulting  OAT  would  still  be  Figure 
1.2. 

How  can  we  test  an  ordering  for  LG?  We  verify  that  each  of  its  vertices  except 
the  top  one  has  at  least  one  neighbor  above  it.  How  can  we  produce  an  ordering 
that  is  local-global?  One  method  would  be  to  generate  orderings  at  random  until 
one  is  reached  that  passes  the  above  test  and  is  therefore  LG.  This  method,  though 
simple,  has  the  disadvantage  that  the  proportion  of  orderings  that  are  LG  becomes 
vanishingly  small  as  n  increases.  There  is  a  better  method  that  produces  only 
orderings.  However,  before  presenting  it,  we  need  one  more  concept:  the  boundary. 

Suppose  we  had  an  LG  ordering  written  down  somewhere,  and  all  we  could 
remember  were  the  first  five  vertices.  What  could  we  deduce  about  the  sixth?  The 
sixth  vertex  would  have  to  be  adjacent  to  one  or  more  of  the  first  five,  and  distinct 
from  them.  If  S  is  a  subset  of  the  vertices  of  the  n-cube,  we  define  the  boundary 
of  5  to  consist  of  all  vertices  x  such  that  z  is  not  in  S  and  such  that  for  some  y 
in  S,  x  and  y  are  adjacent.  It  is  obvious  that  the  sixth  vertex  in  any  LG  ordering 
must  belong  to  the  boundary  of  the  first  five;  in  general  the  i  +  1st  vertex  in  an  LG 
ordering  must  be  an  element  of  the  boundary  for  the  first  t  vertices.  To  recursively 
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enumerate  all  LG  orderings,  then,  we  use  the  following  procedure. 

Procedure  Enumerate  (When  this  procedure  is  called  it  is  passed  the  values  of 
n,  i,  and  the  first  i  values  of  an  array  A/l,...,n/  of  vertices.) 

0.  Begin 

1.  If  i=n,  output  A  and  go  to  5. 

2.  Compute  the  boundary ,  B,  of  A[l),  ...,A[iJ 

3.  Set  i  :=  i+1. 

4.  For  each  member  x  of  B,  Do 

Begin 

Set  A[i]  :=  x 

Call  procedure  enumerate 

End 

5.  End 

If  we  want  to  produce  one  LG  ordering  randomly,  we  change  step  4  to  “For 
some  member  x  of  B” .  Then  at  each  step  of  the  random  generation  process,  a  vertex 
x  is  selected  from  the  boundary,  assigned  a  father,  and  attached  to  the  tree. 

How  we  select  i  determines  what  the  underlying  distribution  is.  We  could 
(theoretically  speaking)  choose  x  so  as  to  give  each  LG  ordering  an  equal  chance 
of  being  produced:  this  distribution  is  called  “all  LG  orderings  equally  likely.”  We 
could  let  each  member  of  the  boundary  have  an  equal  chance  of  being  chosen:  this 
distribution  is  called  “boundary  members  equally  likely”  or  just  the  “boundary” 
distribution  for  short.  It  can  be  thought  of  in  the  following  way:  if  we  know  what 
the  *  best  points  are  (best  in  the  sense  of  best  objective  function  value),  then  if  the 
function  is  going  to  be  LG,  the  i  +  1st  best  point  must  be  a  member  of  the  boundary 
set  of  these  i  points;  saying  that  these  boundary  points  all  have  equal  probability 
of  being  that  i  +  1st  best  point  is  exactly  what  is  meant  by  “boundary  members 
equally  likely.” 

Note:  “boundary  members  equally  likely”  is  not  the  same  as  “all  LG  orderings 
equally  likely,”  because  the  size  of  the  boundary  at  stage  i  varies  depending  on 
what  the  previous  choices  have  been.  Thus,  an  ordering  that  has  an  unusually  large 
boundary  in  the  early  stages  would  be  more  likely  to  occur  under  the  LG-orderings- 
equally-likely  distribution  than  under  the  boundary  distribution. 


Every  boundary  member  has  at  least  one  neighbor  that  has  already  been 
selected,  but  some  have  more  such  neighbors  than  others  (n  at  most,  since  each  ver¬ 
tex  has  n  neighbors).  An  alternate  criterion  would  be  to  give  each  boundary  member 
a  weight  proportional  to  the  number  of  chosen  neighbors  it  has;  thus  vertices  with 
more  chosen  neighbors  are  more  likely  to  be  chosen  themselves.  For  reasons  that 
will  become  clear  later,  this  distribution  is  called  the  coboundary  distribution.  This 
notion  of  randomness  may  seem  slightly  preferable  to  the  boundary  distribution 
if  one  thinks  that  one  can  judge  a  vertex  by  its  neighbors  (not  an  unreasonable 
supposition  for  a  local-global  problem). 


1.4  BATS 


We  have  now  defined  three  different  possible  distributions  on  local-global  prob¬ 
lems.  From  an  ordering  we  can  deduce  the  OAT  which  tells  us  exactly  how  many 
iterations  the  OA  algorithm  will  take.  Unfortunately,  it  is  hard  to  analyze  OATs 
theoretically  because  an  OAT  is  so  tightly  constrained;  what  nodes  can  be  attached 
in  one  place  often  depends  on  what  nodes  are  atached  somewhere  else.  For  instance, 
the  tree  in  Figure  1.3  is  not  an  OAT: 
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Figure  1.3 


The  tree  implies  /( 11)  <  /(01)  <  /(00),  so  /(ll)  <  /( 00).  By  the  optimal 
adjacency  rule,  the  father  of  10  must  be  the  member  of  the  set  {00, 11}  which  has 
the  larger  function  value.  The  above  inequality  implies  that  00  should  be  the  father, 
but  in  Figure  1.3  the  father  is  11,  instead.  This  is  a  contradiction,  so  the  tree  is  not 
an  OAT.  To  make  it  an  OAT,  the  node  10  would  have  to  be  moved  and  attached  to 


the  node  00.  A  more  subtle  example  is  shown  below  in  Figure  1.4. 
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Figure  1.4 


Consider  the  node  101:  its  three  neighbors  are  100,  001,  and  111.  By  definition 
of  the  optimal  adjacency  rule,  the  one  of  these  with  largest  function  value  must 
be  the  father  of  101.  Looking  at  Figure  1.4,  we  see  that  the  father  is  100.  This 
implies  that  /(100)  >  /(001).  However,  we  can  deduce  by  the  same  reasoning 
that  /(001)  >  /(010)  and  that  /(010)  >  /(100).  Together  these  relations  yield 
/( 100)  >  /(001)  >  /(010)  >  /(100),  a  contradiction.  The  tree  in  Figure  1.4  is 
therefore  not  an  OAT.  The  problem  is  that  the  OA  algorithm  must  choose  the  best 
neighbor  to  go  to.  Suppose  we  relaxed  this  condition,  and  only  required  that  the 
algorithm  proceed  from  a  vertex  to  a  better  adjacent  vertex.  The  trees  in  Figures 
1.3  and  1.4  are  consistent  with  the  action  of  such  an  algorithm.  Formally,  we  define 
the  Better  Adjacency  (BA)  algorithm  as  follows: 

BA  (Better  Adjacency)  Algorithm 

1 .  Start  at  a  random  vertex  x. 

2.  Search  through  x’s  neighbors  until  a  better  one,  y,  is  found  or  all  neighbors 
have  been  tried.  In  the  former  case  set  x  equal  to  y  and  iterate  (2.);  in  the  latter 
case  stop  with  x  optimal. 


This  algorithm  is  valid  for  the  same  reasons  as  the  OA  algorithm;  its  repre¬ 
sentation  is  the  Better  Adjacency  Tree,  or  BAT.  Given  any  LG  ordering,  we  can 
pick  a  higher  valued  neighbor  for  each  vertex  (except  the  origin)  and  make  that 
neighbor  its  father.  The  resulting  tree  (it  is  a  tree  since  it  has  no  cycles  and  there 
is  a  path  connecting  all  nodes  to  the  origin)  is  a  BAT. 

For  example,  the  tree  in  Figure  1.3  could  have  been  generated  from  the  ordering 
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The  tree  in  Figure  1.4  could  have  been  generated  from  the  ordering 
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Of  course,  the  set  of  BATs  properly  contains  the  set  of  OATs. 

An  OAT  can  be  thought  of  as  a  directed  graph  H  =  (V^F)  whose  vertex  set  V 
consists  of  the  vertices  of  the  n-cube,  and  whose  directed  edges  consist  of  the  son, 
father  pairs  in  the  OAT.  Suppose  we  have  an  OAT  and  an  ordering  which  produces 
it.  If  we  assign  to  each  vertex  x  a  value  irx  equal  to  its  position  in  the  ordering,  then 
for  every  edge  (i,  y)  in  the  graph,  it  must  be  true  that  irx  >  ny.  This  property  of 
the  7r  values  is  well  known  to  be  a  necessary  and  sufficient  condition  that  a  directed 
graph  be  acyclic,  i.e.,  that  the  graph  has  no  directed  cycles.  Note  that  this  property 
is  necessary;  if  the  graph  H  is  acyclic  we  can  derive  an  ordering  that  corresponds 
to  it.  The  fact  that  we  can  “go  the  other  way”  and  derive  an  ordering  from  the  tree 
allows  us  to  define  OATs  and.  BATs  in  graph  theoretic  terminology  without  explicit 
reference  to  function  values.  These  definitions  have  the  advantage  of  clarifying  the 
distinction  between  OATs  and  BATs. 

Let  G  =  (V,  E )  be  a  graph  of  the  n-cube  so  that  the  nodes  of  G  are  the  vertices 
of  the  n-cube,  and  there  is  an  edge  joining  two  vertices  iff  they  are  adjacent.  Let 
H  =  (V,  F )  be  a  directed  subgraph  of  G  with  outdegree  1  (no  node  has  more  than 
one  edge  leaving  it)  and  with  \F\  =  |V|  —  1  edges.  Then  H  is  a  BAT  iff  it  is 
acyclic.  (The  directed  edges  correspond  to  {son,  father}  pairs  of  the  BAT.)  Now 
define  H'  =  (V,  F')  so  that  F  is  contained  in  Ff  and  for  every  directed  edge  (u,  v) 
in  F  and  edge  (u,w)  in  G,  the  edge  (ru,v)  is  in  F'.  Note  that  H'  is  not  a  subgraph 


10 


of  G,  because  it  contains  these  neighbor-to-father  edges.  Then  H  is  an  OAT  iff  H' 
is  acyclic.  The  intuitive  meaning  is  that  when  building  BATs  the  only  information 
in  a  {son,  father}  pair  is  that  /(son)  <  /(father);  when  building  OATs,  a  {son, 
father  pair}  also  implies  that  / (father)  >  f (any  other  neighbor  of  the  son’s).  In 
both  cases  a  cycle  means  there  is  a  contradiction  via  transitivity  of  “>”. 

Since  only  one  OAT  can  be  generated  from  an  ordering,  the  expected  mean 
pathlength  of  an  OAT  from,  say,  the  boundary  distribution  is  well  defined.  This 
is  not  true  for  BATs  because  usually  many  BATs  can  be  generated  from  a  single 
ordering.  To  resolve  the  ambiguity,  we  adopt  the  convention  that  a  BAT  is  to  be 
randomly  generated  from  an  ordering  by  choosing  from  among  the  possible  fathers 
with  equal  probability,  for  each  vertex.  We  can  now  discuss  the  expected  mean 
pathlengths  of  OATs  and  BATs  from  the  coboundary  and  other  distributions. 


CHAPTER  2. 

SOME  PROPERTIES  OF  OATS  AND  BATS 


2.1  Best  Case  Trees 


The  purpose  of  this  chapter  is  to  explore  some  combinatorial  properties  of  OATs 
and  BATs.  The  later  chapters  do  not  depend  logically  on  any  of  this  material,  but 
we  hope  that  it  will  serve  to  give  the  reader  more  familiarity  with  the  structures 
defined  in  Chapter  1.  The  first  topic  we  deal  with  is  the  best  case  for  a  local 
improvement  algorithm. 


In  an  instance  of  an  LG  problem  with  best  possible  structure,  the  algorithm 
would  decrease  the  Hamming  distance  to  the  optimum  point  at  every  iteration.  If 
the  origin  were  the  optimum,  we  would  always  change  ones  to  zeroes  but  never  zeroes 
to  ones.  Since  on  the  average  the  starting  vertex  will  have  half  of  its  coordinates 
equal  to  one,  we  would  expect  the  average  number  of  iterations  to  be  about  n/2. 
We  have  defined  the  pathlength  of  the  root  to  be  1  (it  takes  one  iteration  to  verify 
optimality),  so  the  exact  value  of  the  lowest  possible  mean  pathlength  is  in  fact 
1  +  n/2. 


One  tree  structure  which  reflects  this  best  case  situation  is  the  binomial  tree 
(see  Knuth,  1973).  The  binomial  tree  of  order  n  is  constructed  inductively  by 
appending  the  binomial  trees  of  order  n  —  1,  n  —  2,  ...  ,0  to  a  root  vertex.  Figure 
2.1  shows  the  binomial  trees  of  orders  one,  two,  and  three. 
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Figure  2.1 


One  of  the  orderings  that  generates  a  binomial  tree  can  be  found  by  listing 
the  vertices  in  the  tree  level  by  level  (starting  at  the  root),  and  going  from  leR  to 
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right  on  each  level.  There  are  of  course  other  orderings  that  would  yield  the  same 
binomial  tree.  In  general  one  can  employ  the  topological  sort  method  (as  described 
in  Knuth,  1973)  to  produce  other  orderings  which  yield  the  same  OAT  as  a  given 
ordering. 

The  binomial  tree  is  n~-t  the  only  possible  OAT  with  smallest  mean  pathlength. 
In  Figure  2.2  we  show  an  OAT  with  mean  pathlength  1  +  n/2  for  n  =  3  which  is 
not  a  binomial  tree. 
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Figure  2.2 


The  best  possible  mean  pathlength  of  a  BAT  is  of  course  also  1  +  n/2.  The 
example  of  a  BAT  that  fails  to  be  an  OAT,  Figure  1.4,  is  one  of  many  such  trees.  The 
total  number  of  best  case  BATs  is  fairly  large.  Since  there  are  (£)  nodes  positioned 
k  levels  below  the  root,  and  each  of  these  nodes  has  fc  possible  fathers  on  the  next 
level  up,  there  exist 


n*® 


k=l 


best  case  BATs  whose  root  is  00. .  .0,  and  there  are  2"  times  as  many  altogether. 


2.2  Worst  Case  Trees 

In  contrast  with  the  best  case,  the  problem  of  finding  the  largest  possible  height 
or  mean  pathlength  of  an  OAT  is  an  unsolved  combinatorial  problem.  Even  for 
small  values  of  n  it  is  not  a  trivial  problem  to  find  an  OAT  of  greatest  height;  to 
give  an  idea  of  how  quickly  OATs  become  complicated  we  show  a  worst  case  OAT 
of  order  4  in  Figure  2.3. 


Figure  2.3 
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Any  path  from  a  vertex  to  the  root  of  an  OAT  must  satisfy  what  we  call  the 
grandfather  clause:  the  vertex  may  not  be  adjacent  to  any  vertices  in  the  path 
except  its  father.  If  the  vertex  were  adjacent  to  its  great  grandfather,  for  instance, 
it  would  not  be  the  son  of  its  optimal  adjacent  point,  since  the  great  grandfather 
must  have  a  better  function  value  than  the  father.  Therefore,  a  vertex  in  a  path 
of  an  OAT  is  adjacent  to  its  father  and  its  son,  but  not  to  any  other  vertices  in 
the  path.  This  property  is  similar  to  the  idea  of  a  snake-in-a-box.  A  n-dimensional 
snake-in-a-box  is  defined  to  be  a  tour,  or  circuit,  of  vertices  on  an  n-cube  with  the 
property  that  every  vertex  in  the  tour  is  adjacent  to  exactly  two  other  vertices  in 
the  tour  (the  ones  immediately  preceding  and  succeeding  it).  If  we  remove  one 
vertex  from  a  snake-in-a-box,  we  have  a  path  that  satisfies  the  grandfather  clause. 
It  is  easy  to  construct  an  OAT  which  contains  some  particular  path:  for  instance, 
make  the  vertices  in  the  path  the  first  vertices  in  the  ordering.  The  other  vertices 
can  be  placed  below  them  in  the  ordering  in  any  way  that  does  not  violate  the  LG 
property,  and  the  OAT  produced  from  such  an  ordering  will  contain  the  given  path. 

Therefore,  given  a  snake-in-a-box  containing  L  vertices,  we  can  always  construct 

an  OAT  whose  height  is  at  least  L—  1.  Victor  Klee  (Klee,  1970)  has  shown  that, 

for  n  >  6,  there  exists  a  snake-in-a-box  of  length 

7X2" 

4(n-l)' 

(The  greatest  length  .3  unknown.)  Subtracting  one  from  this  number  gives  the  lower 
bound  on  the  maximum  height  of  an  OAT  which  was  stated  in  Chapter  1. 
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BATs  are  less  tightly  constrained  than  OATs  and  hence  are  often  easier  to 
analyze.  In  particular,  BATs  need  not  satisfy  the  grandfather  clause.  The  worst 
case  BAT  is  a  single  path;  its  height  is  2n,  and  it  is  exemplified  by  the  Gray  code 
(see  Reingold  et  ai,  1977,  and  Cottle,  1978). 


2.3  The  Local-Global  Property 

The  total  number  of  orderings  of  order  n  is  (2n)!.  For  very  small  values  of 
n  the  orderings  can  be  enumerated  to  discover  the  number  of  them  that  are  LG. 
When  n  equals  two,  2/3  of  them  are  LG;  when  n  equals  three,  3/14  of  them  are  LG. 
As  n  increases,  this  proportion  decreases  rapidly.  However,  the  sequence  does  not 
appear  to  follow  any  simple  combinatorial  pattern.  We  can  estimate  the  proportion 
of  orderings  that  are  LG  in  the  following  way:  an  ordering  is  LG  if  every  vertex  in 
the  ordering,  except  the  root,  has  a  neighbor  above  it.  Thus  the  probability  of  a 
random  ordering’s  being  LG  is  the  probability  of  the  intersection  of  these  events, 


Prob 


P|  [the  kth  vertex  is  not  a  local  optimum) 


(2.3.1) 


Lfc=2  J 

We  make  the  (false)  simplifying  assumption  that  these  events  are  independent  and 
approximate  (2.3.1)  by 


2” 


n  m, 

fc=2 


(2.3.2) 


where  P[k)  denotes  the  probability  that  the  kth  vertex  in  the  ordering  is  not  a  local 
optimum.  Next  we  approximate  the  P[k)  by 

P(»"  -  k)  =  1  -  (*)/ (2"  “  ')  »  1  -  (*/2")n-  (2  3.3) 


The  left  hand  side  of  (2.3.3)  is  one  minus  the  probability  that  the  n  neighbors  will  be 
arranged  in  the  k  lowest  positions  in  the  ordering,  which  is  one  minus  the  probability 
that  the  vertex  in  the  k  +  1st  lowest  position  is  a  local  optimum.  Substituting  (2.3.3) 


into  (2.3.2),  the  estimate  of  the  probability  of  occurence  of  the  LG  property  becomes 


n  (i-(./2T)=«pf  E  i«g(i-(-/2nn) 

»=0  '  i= 0  ' 

-  «p(-  E'(*/2")“)  - 


=  e-2"/n^‘ 


(2.3.4) 


This  estimate  indicates  that  the  LG  i.r^yvj:ty  is  quite  special.  The  chances  of  a 
random  ordering’s  being  LG  become  vaitishiu gly  small  as  n  gets  large.  Experimental 
results  suggest  that  the  estimate  is  somewhat  high.  This  makes  sense  because  the 
estimate  is  based  on  the  assumption  that  the  events,  “the  fcth  vertex  is  not  a  local 
optimum”,  are  independent.  In  fact  these  events  are  negatively  correlated,  for  if  a 
vertex  is  not  a  local  optimum,  its  neighbors  have  a  slightly  higher  chance  of  being 
one.  The  effect  of  the  simplifying  assumption,  therefore,  would  be  to  increase  the 
estimate. 

We  should  point  out  that  the  assumption  that  all  orderings  are  equally  likely  to 
occur  is  not  necessarily  very  realistic.  In  many  problems,  function  values  of  adjacent 
vertices  tend  to  be  relatively  close  to  one  another.  A  more  realistic  simplifying 
assumption,  then,  might  be  that  the  function  values  of  a  point’s  neighbors  are 
independently  distributed,  each  with  an  equal  chance  of  being  better  or  worse.  In 
this  case,  the  probability  that  a  vertex  is  a  local  optimum  is  2— n,  so  the  probability 
that  the  ordering  is  LG  is  approximately  1/e.  Unfortunately  this  assumption  is 
not  self-consistent,  but  it  is  interesting  because  it  suggests  a  relationship  between 
problems  that  are  LG  and  problems  in  which  adjacent  vertices  tend  to  have  similar 
function  values. 

We  have  seen  that  the  concept  of  the  boundary  gives  us  a  simple  formulation 
of  the  LG  property:  an  ordering  is  LG  if  and  only  if  its  t'th  vertex  is  in  the  boundary 
of  the  first  *  —  1  vertices,  for  *  =  2, . . . ,  2n.  Fix  n.  If  B[i )  is  a  number  equal  to  the 
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size  of  the  smallest  possible  boundary  of  a  set  of  t  points,  then 

2"— 1 

n  b(*)  (2-3-5) 

»=i 

must  be  a  lower  bound  on  the  total  number  of  LG  orderings.  In  the  next  chapter 
we  derive  the  following  bounds: 

m > (* " ,),  p><i< pM.  k < 

where  P*  denotes  the  sum  of  the  first  k  +  1  binomial  coefficients.  Substituting  into 
(2.3.5),  we  get 


number  of  LG  orderings  > 


X 


>  2£Si,<"-‘X3 


(2.3.6) 


Although  (2.3.6)  is  a  conservative  lower  bound,  it  is  clear  that  even  for  n  =  6  it 
would  not  be  feasible  to  perform  an  exhaustive  enumeration  of  LG  orderings. 
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CHAPTER  3. 

BOUNDS  ON  EXPECTED  MEAN  PATHLENGTH 


3.1  Empirical  results 

3.1.1  Linearity  of  expected  mean  pathlength 

First  we  briefly  present  some  empirical  results.  From  imulations  performed 
by  Craig  Tovey,  Frank  Heartney,  and  Charles  Fay,  it  appears  that  the  expected 
mean  pathlength  of  both  OATs  and  BATs  is  linear  in  n  for  n  =  4, . . . ,  15  under  all 
three  notions  of  randomness:  boundary,  coboundary,  and  all  LG  orderings  equally 
likely  (see  the  graphs  on  the  following  pages).  Bad  cases  -  large  heights  -  appear  to 
be  extremely  rare.  This  is  the  fundamental  empirical  result.  Though  over  200,000 
OATs  altogether  were  generated,  the  largest  mean  pathlengths  and  heights  seen  were 
just  two  to  three  times  the  average.  The  expected  value  of  the  mean  pathlengths 
appears  to  be  linear  in  n  although  there  exist  cases  that  are  0(2"/n).  The  “bad” 
cases  evidently  are  so  rare  that  it  doesn’t  matter  much  which  underlying  distribution 
is  used:  the  expected  mean  pathlength  remains  small.  The  empirical  results  lead  us 
to  conjecture  that  expected  mean  pathlength  is  less  than  n. 

Conjecture  3.1  Under  any  of  the  distributions,  boundary,  coboundary,  and  LG 
orderings  equally  likely,  the  expected  mean  pathlength  of  both  OATs  and  BATs  is 
less  than  n.  Equivalently,  the  expected  number  of  iterations  of  an  optimal  adjacency 
or  better  adjacency  algorithm  is  less  than  n  with  any  of  these  distributions. 


BOUNDARY  DISTRIBUTION 


S:  BOUNDARY  AND  COBOUNDARY  DISTRIBUTIONS 
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3.1.2  OATs  versus  BATs 

The  empirical  results  cited  above  allow  us  to  compare  the  relative  merits  of 
“optimal  adjacency”  and  “better  adjacency” .  Not  surprisingly,  OATs  have  a  smaller 
average  mean  pathlength:  approximately  .79 n  —  .2  for  OATs  versus  .92n  —  .1  for 
BATs.  We  can  expect,  therefore,  that  choosing  the  best  neighbor  rather  than 
any  better  neighbor  will  lead  to  about  a  fifteen  percent  reduction  in  the  number 
of  iterations,  on  the  average.  This  estimate  is  pertinent  to  the  design  of  local 
improvement  algorithms  for  integer  programming  (Hillier,  1969).  As  we  will  discuss 
in  Chapter  5,  to  apply  the  model  to  the  Simplex  Method  for  linear  programming, 
the  model  can  be  modified  so  that  it  includes  only  a  subset  of  the  hypercube  vertices. 
When  this  is  done,  the  difference  between  OATs  and  BATs  increases  to  about  fifty 
percent,  which  is  consistent  with  experimental  results  of  38  to  67  percent  found 
in  (Cutler  and  Wolfe,  1963).  Though  more  iterations  may  be  required,  the  cost 
of  a  single  iteration  is  apt  to  be  considerably  less  in  the  case  of  better  adjacency 
because  fewer  function  evaluations  are  required.  We  face  a  tradeoff  between  the 
total  number  of  iterations  and  the  cost  of  a  single  iteration.  It  seems  likely  that 
better  adjacency  (or  some  improved  variant  of  it,  such  as  the  best  gradient  approach 
used  in  the  Simplex  Method)  will  usually  yield  the  more  efficient  algorithm.  Which 
alternative  we  choose  will  depend  on  the  nature  of  the  specific  problem. 


3.2  Theoretical  Results 

Theorem  3.2  For  the  boundary  and  coboundary  distributions,  the  expected  height 
of  a  BAT  is  less  than  en 2  logn.  The  same  bound  holds  for  OATs  with  the  boundary 
distribution. 

Corollary.  The  expected  number  of  iterations  of  the  Better  Adjacency  algo¬ 
rithm,  with  respect  to  either  the  boundary  or  coboundary  distribution,  is  less  than 
en2  logn.  The  same  bound  holds  for  the  Optimal  Adjacency  algorithm  with  respect 
to  the  boundary  distribution. 
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The  rest  of  this  section  is  devoted  to  the  proof  of  this  theorem.  Since  the  proof 
at  times  seems  tc  have  little  to  do  with  the  theorem,  it  may  be  helpful  to  give  a 
general  idea  of  it  first. 

Imagine  being  at  some  stage  of  a  process  which  randomly  generates  an  LG 
ordering  and  an  associated  tree  (OAT  or  BAT).  At  each  step  a  vertex  is  selected 
from  the  boundary,  assigned  a  father,  and  attached  to  the  tree;  the  vertex  has  a 
pathlength  one  greater  than  its  father’s.  The  height  of  the  ith  vertex  in  the  ordering 
is  denoted  by  H{.  Then  the  boundary  list  is  updated  and  the  process  iterates.  We 
want  to  know  how  high  the  tree  is  going  to  get.  This  depends  on  which  vertices  tend 
to  be  fathers.  For  all  we  know,  though,  it  could  be  that  the  vertices  with  greatest 
height  tend  to  be  fathers.  If  so,  then  we  will  likely  get  a  long,  skinny  tree.  But 
suppose  we  knew  that,  at  each  iteration,  some  number,  say  100,  of  all  the  nodes 
in  the  tree  had  an  equal  chance  of  being  the  new  father.  What  would  this  piece  of 
information  tell  us?  In  the  worst  case  the  100  nodes  would  be  the  100  lowest  nodes 
in  the  tree  generated  so  far.  Thus  it  would  imply  that  the  expected  height  of  our 
tree  could  be  no  greater  than  the  expected  height  of  a  tree  produced  by  repeatedly 
attaching  a  son  to  a  node  randomly  chosen  from  among  the  hundred  lowest  nodes 
in  the  tree.  This  new  process  is  less  complicated  than  the  original  one,  and  it  turns 
out  to  be  possible,  though  not  trivial,  to  compute  its  rate  of  growth,  and  thus  a 
bound  on  OATs  and  BATs. 

The  proof  of  the  theorem,  therefore,  consists  of  three  parts.  The  first  part 
derives  a  lower  bound  on  the  number  of  vertices  in  the  boundary.  This  will  yield  a 
lower  bound  on  the  number  of  potential  fathers,  because  no  potential  father  can  have 
more  than  n  —  1  neighbors  in  the  boundary.  The  second  part  proves  the  intuitively 
obvious  fact  that  the  “new”  process  docs  indeed  produce  trees  with  greater  mean 
pathlength  than  the  “old”  BAT  and  OAT  producing  processes.  The  third  part  of 
the  proof  is  the  computation  of  the  rate  of  growth  of  the  “new”  process. 

Recall  that  a  hypergraph  H  is  a  set  of  points,  called  vertices,  together  with 
a  collection  of  subsets  of  the  vertices,  and  that  these  subsets  arc  called  edges.  In 


▼  -m 


the  case  of  a  graph,  all  the  edges  contain  exactly  two  vertices;  in  the  more  general 
hypergraph  the  edges  can  vary  in  count,  some  possibly  consisting  of  one  vertex, 
others  with  several.  For  example,  a  hypergraph  on  the  3  vertices  a,  b,  and  c 
might  contain  the  edges  {a}  and  {a,  6,  c}.  As  another  example,  the  hypergraph 
on  n  vertices  which  contains  all  possible  edges  will  have  2n  of  them,  and  will  be  a 
representation  of  the  set  of  all  subsets  of  the  n  vertices. 

If  we  think  of  the  vertices  of  the  hypergraph  as  corresponding  to  each  of  the 
components  (or  dimensions)  of  the  n-cube,  an  edge  of  the  hypergraph  as  being  a 
vertex  of  the  n-cube,  and  adjacent  pairs  of  hypergraph  edges  as  corresponding  to 
edges  of  the  n-cube,  then  we  see  that  the  Kruskal-Katona  theorem  that  follows  is 
a  statement  about  sets  of  vertices  of  the  n-cube  with  minimal  size  boundaries.  We 
define  a  minimal  boundary  set  of  cardinality  i  to  be  a  collection  of  i  vertices  of  the 
n-cube  whose  boundary  is  as  small  as  possible  for  a  set  of  t  vertices. 

If  H  is  a  hypergraph  on  n  vertices  having  M  edges,  then  the  boundary  of  H 
is  the  set  of  all  edges  not  in  H  that  differ  from  members  in  H  in  only  one  vertex. 
Order  the  vertices  of  H  from  1  to  n  and  denote  an  edge  e  by  a  sequence  of  zeroes 
and  ones,  the  z-th  entry  being  one  when  e  contains  the  z-th  vertex.  Define  w(e)  to 
be  the  n  -f  1-digit  number  which  is  this  sequence  with  a  leftmost  digit  appended  so 
that  the  sum  of  all  the  digits  equals  n. 

Theorem  3.3  (Kruskal,  Katona)  A  hypergraph  on  n  vertices  having  M  edges  that 
minimize  its  boundary  can  be  formed  by  choosing  the  M  edges  having  largest  w 
values. 

Proof.  See  (Kleitman,  1979),  pp.  47-48. 


Boundary  Theorem  3.4  Let  P *  denote 

k 

j=o 
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the  sum  of  the  first  k  +  1  binomial  coefficients  and  let 


i=Pk- 


Then  the  size  of  the  smallest  boundary  of  a  set  of  size  i  is  If  i  instead  satisfies 


Pk  <  i  <  Pk  +  i 


then  a  lower  bound  on  the  boundary  size  of  a  set  of  size  i  is  given  by 

n  —  3 


G+i)  i/k- 
G "  2)  ifk^ 


2  1 

n  —  3 


When  k  =  the  bounds  for  the  two  cases  are  equal. 


Proof.  According  to  the  construction  of  the  minimal  boundary  set  given  by  the 
Kruskal-Katona  theorem,  all  vertices  (of  the  n-cube)  with  j  ones  must  be  picked 
before  any  vertex  with  j  +  1  ones,  because  their  w  values  have  a  larger  leading  digit. 
That  is,  if  the  minimal  boundary  set  given  by  the  theorem  contains  a  vertex  with 
j  +  1  ones,  it  must  contain  all  vertices  with  j  ones.  Since  the  number  of  vertices 
with  j  ones  equals  (”),  the  minimal  boundary  set  of  cardinality  t  is  the  set  of  all 
vertices  with  k  or  fewer  ones  when  t  equals  Pk,  the  sum  of  the  first  k  +  1  binomial 
coefficents.  The  boundary  of  this  set  consists  of  all  vertices  with  exactly  k  +  1  ones. 
There  are  (fc"j)  such  points,  so  we  have  proved  the  first  part  of  the  theorem. 

We  need  the  LYM  (Lubell,  Yamamoto,  and  Meshalkin)  inequality  to  prove  the 
rest  of  the  theorem.  The  possible  edges  of  a  hypergraph  form  a  partially  ordered 
set  by  inclusion.  Recall  that  an  antichain  F  is  a  collection  of  these  edges  such  that 
no  member  of  F  contains  another.  If  we  denote  the  number  of  members  of  F  of 
cardinality  j  by  fj,  then  the  LYM  inequality  (Kleitman,  1974)  states  that 


n  t. 

rA<  1 

Zj  (n\  ^  L- 
j— 0  Vj/ 
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Now  suppose  that  i  is  some  intermediate  value  between  P*  and  P*+i.  The  Kruskal- 
Katona  set  contains  all  vertices  with  k  ones  and  some  vertices  with  k  4- 1  ones. 
The  set’s  boundary  obviously  includes  all  vertices  with  k  +  1  ones  —  called  “k+1- 
vert ices”  for  short —  that  are  not  in  the  set  itself,  as  well  as  those  k  +  2- vertices  that 
are  neighbors  of  the  k  4-  1-vertices  in  the  set.  Let  s  be  the  number  of  k  +  1- vertices 
in  the  set,  so  a  =  i  —  P*.  Let  z  equal  the  number  of  k  +  2- vertices  not  in  the  set’s 
boundary.  The  k  +  1- vertices  in  the  set,  together  with  the  k  +  2- vertices  not  in  the 
boundary,  form  an  antichain.  Then  by  the  LYM  property, 


If  k  is  less  than  or  equal  to  (n  —  3}/2  this  implies  that 


—  z  >  s 


The  left  hand  side  of  the  above  is  the  number  of  k  +  2- vertices  in  the  boundary, 
so  the  total  number  of  elements  in  the  boundary  is  at  least  (t”t).  Similarly,  if 
k  >  (n  —  3)/2,  the  LYM  inequality  yields 


—  a  >  z 


The  left  hand  side  of  this  relation  is  the  number  of  ( k  +  l)-vertices  in  the  boundary, 
and  therefore  the  size  of  the  boundary  is  at  least  (*”2)-  Note  that  when  k  equals 
the  number  n  must  be  odd,  so 

(*;,)- (4.) -(4.) -(*;,) 

and  the  bounds  for  the  two  cases  coincide.  | 


Consider  the  process  which  randomly  generates  an  OAT  according  to  the 
boundary  distribution,  and  let  the  random  variables  H i , . . . ,  H 2»  denote  the  heights 
of  the  nodes  in  the  ordering.  By  convention  the  "height”  of  the  top  node  or  root  is 


1.  Then  Hi  and  H 2  always  take  the  values  1  and  2  respectively,  H3  takes  the  values 
2  and  3  with  equal  probability,  and  the  distributions  of  the  later  //,  are  increasingly 
complicated. 

If  X  and  Y  are  two  random  variables  with  cumulative  distribution  functions 
Fx{t)  and  Fy{t)  respectively,  then  X  is  said  to  stochastically  dominate  Y,  written 
X  y  Y,  if  Fx[t)  <  Fv(t)  V  t.  It  is  obvious  that  stochastic  dominance  is  transitive  and 
that  if  X  X  Y  then  E[X j  >  £[Y]  where  ]  denotes  expectation.  To  extend  this 
concept  to  sequences  such  as  the  Hi  we  say  that  the  sequence  of  random  variables 
X  =  X\ ,  X2, . . .  stochastically  dominates  Y  =  Yi ,  Yjj, . . .  iff  X,-  y  Y|  V  t,  and  we 
write  X  yY. 

We  will  now  describe  a  stochastic  process  whose  outcomes  will  stochastically 
dominate  the  Hi.  The  process  is  called  the  largest  k  process  and  is  denoted  by 
Lk.  Let  A;  =  k\,  k2, . . .  be  a  sequence  of  positive  integers.  The  sequence  of  random 
variables  Lk  =  L\,Lkt...  is  iteratively  produced  in  the  following  way:  L\  =  1; 
given  the  values  of  Lk, . . . ,  Lk_lt  one  of  the  largest  of  them  is  selected  at  random 
(each  with  equal  probability)  and  its  value  plus  one  is  given  to  Lk.  For  example, 
if  k  is  a  sequence  of  ones,  that  is  if  =  1  V  t,  then  Lk  will  equal  i  for  all  t.  As 
another  example,  suppose  that  k\  =  k?  =  1,  and  k3  =  k4  =  2.  Then  Lk  will 
equal  1,  Lk  will  equal  2,  L3  will  take  the  values  2  and  3  with  equal  probability,  and 
Lk  will  take  the  value  3  with  probability  3/4  and  the  value  4  with  probability  1/4. 
(It  is  sometimes  convenient  to  let  fcj  be  greater  than  i  —  1;  in  these  cases  we  are 
interested  only  in  the  long  term  behavior  of  the  L,-  and  we  will  be  able  to  assume 
the  existence  of  Z^,L_i,L_2,  . . .  each  with  value  zero.) 

We  now  return  to  the  random  OATs  produced  by  the  boundary  distribution 
and  the  random  heights  of  its  orderings  H\, . . .  ,/f2n-  Let  T  =  "Vi,...,"V2n  be 
random  variables  which  are  the  vertices  in  the  ordering  generated,  so  ^  is  the 
tth  vertex  in  the  ordering.  Consider  any  partial  realization  of  the  OAT  generating 
process:  suppose  the  first  i  vertices  in  the  ordering  have  been  chosen  and  denote 
the  list  of  vertices  by  v  =  ui, . . . ,  v,-. 


Suppose  further  that  the  random  variables  ,Hi  have  values  hi,...,  hi 

respectively.  For  simplicity  of  notation,  we  use  V  —  v  to  denote  the  condition 

V>  =  t>y;  j  = 

and  similarly,  we  let  H  =  h  denote  the  condition 

Hj  =  hj\  j  =  1, 

What  can  we  say  about  the  conditional  distribution  of  Hi+ 1,  given  V  =  v,  H  =  h? 
When  *  is  greater  than  one,  any  vertex  «y,  1  <  j  <  *,  can  have  at  most  n  —  1 
neighbors  in  the  boundary  of  v.  Let  S(t>)  denote  the  boundary  of  v.  According 
to  the  boundary  distribution,  every  member  of  B{v)  has  an  equal  probability  of 
becoming  the  value  of  V,+i.  Therefore,  for  any  particular  j,  the  probability  of  vy 
being  the  father  of  vi+1  is  less  than  or  equal  to 

n  —  1  ^  n  —  1 

\m  -  w 

where  B(i )  is  the  lower  bound  on  the  boundary  size  given  by  the  boundary  theorem. 

Assume  inductively  that  we  have  constructed  a  sequence  L  =  L i,  • . . ,  £«,  such 
that  Lj  >  Hj,  for  all  j  =  l,...,z,  and  such  that  L  is  distributed  as  a  largest 
k  process  with  kj  =  B(j).  Given  H  =  h,  V  =  v,  it  is  easy  to  construct  £«+i  > 
H,+ 1  so  that  jL<+i  takes  the  largest  [B(i)/{n  —  1)J  values  in  the  L  sequence  with 
equal  probability.  This  completely  defines  the  distribution  of  Li+i  since  we  have 
defined  its  conditional  distribution  for  all  conditions  H  =  h,  V  —  v.  Of  course,  the 
distributions  of  H  and  L  are  not  independent.  We  may  not  know  the  probability 
of  a  particular  event  H  =  h,  V  =  v.  We  do  know,  however,  that  L  is  distributed 
as  a  largest  k  process  with  k,-  =  [J3(i')/(n  —  1)J.  This  is  true  because  all  of  the 
conditional  distributions  have  that  characteristic.  We  have  therefore  proved  the 
dominance  lemma  below  for  OATs. 

Dominance  Lemma  3.5  Suppose  B[i)  is  a  lower  bound  on  the  size  of  the 
boundary  of  i  vertices.  Let  k  =  fc), . . . ,  be  the  vector  of  integers  such  that 
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ki  —  max[l,  [B(i)/(n  —  l)J]  and  let  H  denote  the  sequence  of  random  variables 
which  are  the  heights  of  the  nodes  in  an  OAT  from  the  boundary  distribution.  Then 
Lk  >  H.  Moreover,  the  same  result  holds  for  a  BAT  from  either  the  boundary  or 
coboundary  distribution. 

Proof.  The  case  of  BATs  from  the  boundary  distribution  is  almost  exactly  the 
same  the  above.  The  key  fact,  again,  is  that  the  probability  of  Vj  being  the  father 
is  not  more  than 

n  —  1 

W) 

for  any  particular  j.  The  only  difference  is  that  in  this  case,  it  is  necessary  to  specify 
the  events  by  V  =  v,  H  =  h.  In  the  case  of  OATs,  it  would  have  sufficed  to  specify 
only  V  —  v,  since  with  OATs  the  choice  of  vertices  in  the  ordering  completely 
determines  the  pathlengths. 

The  case  of  coboundary  BATs  is  similar.  If  S  is  a  subset  of  vertices  of  the 
n-cube,  we  define  the  coboundary  of  5  as  the  set  of  all  pairs  ( x,y )  such  that  i  is 
in  S,  y  is  not  in  S,  and  x  and  y  are  adjacent.  In  this  case  we  let  S  equal  v,  so 
the  coboundary  consists  of  all  pairs  (x,y)  such  that  x  is  in  the  partially  constructed 
ordering,  y  is  in  the  boundary,  and  x  and  y  are  adjacent.  To  generate  the  BAT 
we  choose  among  the  coboundary  pairs  with  equal  probability.  Again,  no  father 
can  appear  in  more  than  n  —  1  pairs,  and  since  there  are  at  least  as  many  pairs 
as  boundary  elements,  the  result  must  be  stochastically  dominated  by  Lk.  This 
completes  the  proof  of  the  lemma.  | 

Now  we  find  the  expected  behavior  of  Lk.  The  value  of  B[i)  and  hence  fc, 
remains  the  same  as  i  ranges  between  two  consecutive  partial  sums  of  the  binomial 
coefficients.  While  the  value  of  ki  is  constant  and  equal  to  say,  m,  the  largest  k 
process  can  be  thought  of  as  a  climbing  process,  where  there  are  m  balls  distributed 
contiguously  on  at  most  m  levels.  There  arc  infinitely  many  empty  levels  above  for 
the  balls  to  climb  up  to.  At  each  iteration,  a  ball  is  selected  at  random:  a  new  ball 
is  added  to  the  level  above  the  selected  ball,  and  one  ball  is  removed  from  the  lowest 
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non-empty  level.  Emulation  results  by  the  author  suggested  that  this  process  climbs 
at  a  rate  approaching  e/m  (where  e  is  the  natural  logarithmic  constant)  for  large 
m.  Incomplete  proofs  were  given  by  (Keller,  1980;  Wiener,  1980).  This  conjecture 
was  finally  proved  by  David  Aldous  and  James  Pittman: 

Theorem  3.6  Let  jam  denote  the  expected  rate  of  growth  of  the  climbing  process 
when  there  are  m  balls.  Then  mpm  is  less  than  e  and  asymptoticaly  approaches  e 
asm-*  oo.  Equivalently ,  let  m  be  a  positive  integer  and  let  M  be  a  sequence  of 
m’s.  Then  the  expected  rate  of  growth,  pm,  of  the  sequence  LM,  is  less  than  e/m 
and  mpm  -*easm-*  oo. 

For  a  complete  proof  of  this  result,  see  (Aldous  and  Pittman,  1980).  Here  we 
present  that  part  of  Aldous  and  Pittman’s  work  which  shows  that  the  expected  rate 
of  growth,  pm,  is  less  than  or  equal  to  e/m. 

At  any  time  t  there  exists  a  finite  colony  of  particles  among  sites  0,  1,  2,  ... , 
the  sites  forming  some  interval  (0,  f].  The  colony  evolves  by  individual  particles 
independently  giving  birth  at  rate  1.  The  "daughter”  particle  gets  placed  at  the 
site  to  the  right  of  the  “mother"  particle.  Let  X<(t)  be  the  number  of  particles  at 
site  i  at  time  t.  Let 

X(<)  =  {X0(t),Xl(t),...}. 

We  can  think  of  X(t)  as  a  countably  valued  Markov  chain.  Note  that  the  X(t)  do 
not  give  a  complete  description  of  the  process  since  they  ignore  geneology. 

The  total  number  of  particles  at  time  f, 

m = £  *<« 

i>0 

evolves  as  a  simple  birth  process.  Therefore, 


dt 

and 

E(N(t))  =  e*. 
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dN{t) 


The  front  of  the  colony  at  time  t  is  the  largest  index  i  such  that  Xi(t)  >  1,  and  is 
denoted  by  f(X(t)).  We  will  prove  that 


lim  sup 

t— *oo 


msn 

t 


<  e 


a.a. 


Let 


m,(t)  =  E(Xi(t)) 


equal  the  expected  number  of  particles  at  site  i  at  time  t.  Then  m0(t) 
start  with  one  particle  at  zero.  Also, 


dt 


i  >  I- 


Solving  these  differential  equations  gives 


Now  let  7  be  some  constant  greater  than  e.  We  have 

P(/(X(t))  >  7 1)  =  E{ l{/(x(0)>7t>)  <  4  £  X,(f) 

V«>  it 

By  definition  and  (3.2.2), 

e(  e  =  E  m«w  =  E 

\»>Tft  /  »>7t  » >  Tf* 

Combining  (3.2.3)  with  the  above  yields 


E nnm) >m<EE ‘V* < E ^ 


t=i 


t=l  »>-7* 


»’=  i  '  t<«/i 


(3.2.1) 


1  if  we 


(3.2.2) 


(3.2.3) 


By  Stirling’s  formula,  the  summation  term  in  (3.2.4)  can  be  approximated  by 


Therefore,  the  sum  in  (3.2.4)  converges.  By  the  Borel-Cantelli  Lemma, 

/WO)  <  i< 

for  all  sufficiently  large  integers  t,  so  (3.2.1)  is  true.| 


We  can  now  prove  theorem  3.2,  which  we  stated  at  the  beginning  of  the  section. 
The  expected  height  reached  by  the  L*  for  t  =  2  up  to  the  sum  of  the  first 
binomial  coefficients  is 

I  n-il 

E«("  -  »w>  -  «(»  - 1)  E  G) / G)  = «(» -  oi^j. 

i  j  =  l  ' 

The  expected  increase  in  Lf  as  i  then  increases  to  2n  —  1  is 


e(n  —  l)(n  logn  —  n/ 2). 


The  expected  increase  in  L*  as  i  increases  from  1  to  2  and  from  2n  —  1  to  2n  is 
no  more  than  2.  The  sum  of  these  three  increases  is  less  than  en2  log  n,  which  is 
therefore  an  upper  bound  on  the  expected  height  of  boundary  OATs  and  BATs,  and 
coboundary  BATs.  | 
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CHAPTER  4. 

ADDITIONAL  BOUNDS  ON  AVERAGE  MEAN  PATHLENGTH 


4.1  Coboundary  OATs 

The  proof  of  the  dominance  lemma  fails  in  the  case  of  coboundary  OATs.  Both 
common  sense  and  simulation  trials  suggest  that  coboundary  OATs  must  have  lower 
bounds  than  coboundary  BATs,  but  we  have  been  unable  to  prove  this  is  indeed 
the  case.  It  is  not  hard  to  see,  however,  that  if  m  equals  B(i)/n(n  —  1)  instead 
of  B(i)/[n  —  1),  the  LK  process  must  bound  the  Hi  values  of  coboundary  OATs. 
This  puts  an  additional  factor  of  1/n  into  the  denominator  of  the  summation  term 
in  the  proof  of  the  theorem  in  the  previous  section.  Therefore,  the  upper  bound 
given  there  can  be  multiplied  by  n  to  apply  to  coboundary  OATs.  This  yields  the 
following  weaker  result: 

Theorem  4.1  The  expected  height  of  an  OAT  from  the  coboundary  distribution 
is  less  than 

en3  log  n. 


4.2  An  0(n2)  Bound  on  Expected  Mean  Pathlength 

The  height  of  a  tree  is  the  maximum  pathlength  of  a  node  in  the  tree;  the  mean 
pathlength  is  the  average  taken  over  all  nodes  in  the  tree.  The  bounds  that  have 
been  given  are  on  the  expected  height  rather  than  the  expected  mean  pathlength 
of  the  tree,  though  it  is  the  latter  that  corresponds  to  the  expected  performance 
of  the  algorithm.  Since  of  course  the  former  is  always  the  larger  of  the  two,  it  is 
possible  to  compute  a  slightly  better  bound. 


Theorem  4.2  The  expected  mean  pathlength  of  an  OAT  under  the  boundary 
distribution  or  of  a  BAT  under  either  the  boundary  or  coboundary  distributions  is 
less  than  en2. 

Proof:  The  expected  values  of  hi, . . . ,  hi, . . . ,  ht,  where  t  —  (£)  =  i^»-i  j. 

are  all  less  than  en2/ 2  because  en2/2  is  greater  than  e(n  —  For  all  values 

of  i  greater  than  P^-i  j.  we  must  add  to  en2/ 2  at  least  an  additional  amount 


In  general,  for  any  i  which  satisfies 


the  E{hi )  gets  an  additional 


*•-' «C)/c:.) 


for  each  j  =  [^^J, . . .  ,n  —  1.  The  total  of  all  these  contributions  to  the  sum  of 
the  pathlengths  is 


n+l-k 


Pk  <  e(n  -  l)n  ^  1  /kPk 


<  e(n-  l)n  £  Q  «  en22n“l. 


So  2nen2/2  +  en22"~*  =  2nen2  is  a  bound  on  the  sura  of  the  expected  pathlengths, 
and  therefore  dividing  by  2n,  the  number  of  vertices,  the  expected  mean  pathlength 
is  bounded  by  en2.  | 


.-«•  i 


4.3  The  Coboundary. 


Recall  that  the  boundary  of  a  subset  S  of  vertices  of  the  n-cube  is  defined  as 
the  set  of  all  vertices  y  not  in  S  such  that  for  some  x  in  S,  (x,y)  is  an  edge  of  the 
cube.  We  define  the  coboundary  of  S  to  be  the  set  of  all  distinct  edges  (x,y)  where 
X  is  in  S  and  y  is  not.  The  coboundary  is  clearly  at  least  as  large  as  the  boundary. 
Now  notice  that  in  the  proof  of  the  dominance  lemma  for  coboundary  BATs,  B(i) 
could  serve  just  as  well  as  a  symbol  for  a  lower  bound  on  the  size  of  the  coboundary 
as  for  the  size  of  the  boundary.  Our  next  goal  is  to  find  a  better  value  for  B{i),  i.e., 
a  lower  bound  on  the  size  of  the  coboundary.  This  is  provided  via  the  following 

Theorem  4.3  (Harper,  Bernstein)  For  any  i,  0  <  t  <  2n,  a  subset  of  the  vertices 
of  size  i  with  minimal  coboundary  size  consists  of  the  i  smallest  vertices,  where  the 
vertices  are  thought  of  as  n-digit  binary  numbers. 

Proof.  See  (Clements,  1971,  or  Katona,  1974). 

Theorem  4.4  If  i  is  greater  than  or  equal  to  2d  and  less  than  2d+l,  where  d  is  less 
than  n  —  1,  then  2d(n  —  d)  is  a  lower  bound  on  the  coboundary  of  a  set  of  cardinality 
i.  If  i  is  greater  than  2n~1,  a  bound  can  be  obtained  by  replacing  i  with  2n  —  t. 

Proof.  Let  Si  be  the  number  of  interconnections  (unordered  pairs)  within  the 
minimal  coboundary  set  of  cardinality  t.  The  coboundary  has  size  ni  —  2 S,-.  For 
2d  <  t  <  2<1+1  the  members  of  the  set  are  the  same  as  for  0  <  i  <  2d  except 
for  the  “1”  in  the  dth  place.  Therefore,  Si+i  —  Si  equals  $i+ 1_2<*  —  S»_ plus  the 
number  of  points  with  a  “0”  in  the  dth  place  that  are  adjacent  to  the  i  +  1st  point. 
This  obviously  equals  one.  Letting  R{  =  Sj+i  —  Si  denote  the  ith  difference,  we 
have  shown  that 


Ri=*Ri~ 2-  +  1;  2d  <  i  <  2d+l. 

This  relation  gives  rise  to  the  following  as  the  Ri  sequence: 

0 

1 
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1,2 

1,2, 2, 3 

1, 2, 2, 3, 2, 3, 3, 4 

1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5 

The  following  proposition  states  some  elementary  properties  of  the  /?<  and  $,■ 
sequences. 

Proposition. 

1)  S2d  =  d2d~l 

2;  S2d+,  -  s2“  =  (d  +  2)2d-‘ 

3;  =  £,_*-» ;  2d  <  t  <  2d  +  2d_1 

4J  =  1  +  fl,_2d-,;  2d  +  2d_l  <  »  <  2d  +  2d_1  +  2d~2 

We  prove  the  theorem  inductively,  assuming  it  is  true  for  t  less  than  2d  to  show 
it  is  true  for  i  less  than  2d+1.  When  i  —  2d,  the  set  is  a  d-dimensional  cube  and  the 
bound  given  by  the  theorem  is  tight.  Letting  j  =  t  —  2d,  it  is  therefore  necessary 
and  sufficient  to  show  that 

i 

«;  >  2  £  Ri 

p=2* 

where  2d  <  t  <  2d+1  <  2n. 

For  the  range  2d  <  t  <  2d  +  2d_l,  by  the  proposition  above,  we  have  Ri  — 
Ri-24~l  a°d  the  result  follows  from  the  inductive  hypothesis. 

For  the  range  2d  +  2d_1  <  *  <  2d+1,  we  note  that  it  is  sufficient  to  prove 
the  result  for  the  case  where  d  equals  n  —  2.  In  this  case  the  symmetry  between 
i  and  2n  —  i  allows  us  to  prove  the  result  for  the  range  2d+1  <  i  <  2d+1  +  2d_1 
instead.  The  bound  given  by  the  theorem  obviously  holds  for  t  =  2d+1.  But  Ri 
in  this  range  equals  and  the  bound  has  already  been  shown  to  hold  for  the 

range  2d  <  i  <  2d  +  2d_1.  This  completes  the  proof  of  the  theorem  4.4.  | 
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4.4  An  0(n  log  n)  bound  for  coboundary  DATs. 

We  are  now  able  to  compute  a  better  bound  for  the  coboundary  distribution. 
When  d,  the  size  of  the  partially  constructed  ordering,  is  ranging  between  2‘  and 
2*+1  for  t  <n  —  l,  the  //,•  values  are  growing  at  a  rate  bounded  by  a  largest  K 
process  with  k =  2*(n  —  t)/(n  —  l).  This  range  for  d  consists  of  2,+1  —  2'  =  2‘ 
choices  (iterations  of  the  largest  k  process),  so  the  increase  in  Hi  values  is  bounded 
by 


(2l)e/k  =  e(n  —  l)/(n  —  t). 

By  the  equivalence  between  d  and  2n  —  d  mentioned  in  the  theorem  above,  the 
increase  in  h  values  of  the  bounding  process  is  mirrored  as  d  ranges  between  2n_1 
and  2n.  Therefore,  a  bound  on  the  last  H ,■  value  is  given  by  twice  the  summation, 
from  *  =  0  to  n  —  1,  of  e(n  —  l)/(n  —  t), 

n—i 

2 e(n  —  1)  ^  l/(n  —  t)  ~  2e(n  —  1)  log  n. 
i=0 

Theorem  4.5  The  expected  mean  pathlength  of  coboundary  DATs  is  less  than 
2enIogn.  The  expected  mean  pathlength  of  coboundary  OATs  is  less  than  2 en2  logn 

Proof:  The  bound  for  coboundary  BATs  has  been  proved.  To  see  how  to  adjust 
this  bound  to  apply  to  coboundary  OATs,  we  must  examine  the  reasoning  used  in 
the  Dominance  Lemma.  Again,  suppose  the  random  generation  of  a  coboundary 
OAT  is  partially  realized  and  that  the  tth  vertex  is  about  to  be  chosen.  Fix  j  to  be 
less  than  i  and  consider  the  jth  vertex  chosen.  What  is  the  probability  that  this 
vertex  will  be  the  father  of  the  tth  vertex?  The  ;th  vertex  cannot  have  more  than 
n  —  i  neighbors  in  the  boundary.  Each  of  these  neighbors  has  at  most  a  probability 
n/C  of  being  chosen  as  the  tth  vertex,  where  C  is  the  size  of  the  coboundary. 
Therefore,  the  probability  of  being  a  father  cannot  be  more  than  n(n  —  1  )/C.  This 
puts  an  extra  factor  of  n  into  the  computation  of  the  upper  bound  for  the  case  of 
coboundary  OATs.  | 
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CHAPTER  5. 

APPLICATIONS  TO  PRINCIPAL  PIVOTING  AND  THE  SIMPLEX  METHOD 


It  is  not  essential  to  the  concept  of  a  BAT  that  there  be  a  real  valued  function 
defined  on  the  vertices.  All  that  is  necessary  is  that  one  vertex  be  optimal  in  some 
sense  and  that  every  other  vertex  have  a  father  (a  neighboring  vertex)  assigned 
to  it  in  such  a  way  that  the  resulting  structure  is  a  tree,  i.e.  has  no  cycles,  or 
equivalently,  traversing  from  vertex  to  father  to  father  must  eventually  lead  to  the 
root.  Consider  the  linear  complementarity  problem:  given  q  6  3?n  and  M  €  $RnXn, 
find  w,  z  E  9Rn  such  that 

—Mz  +  Iw  =  q 
zTw  =  0 
z>0 
w  >  0. 

The  possible  solutions  to  this  problem  involve  choices  of  the  complementary  basis 
B,  an  n  by  n  matrix  whose  ith  column  is  the  ith  column  of  cither  —M  or  the 
identity  matrix,  I.  When  M  is  a  P-matrix,  principal  pivoting  methods  solve  LCP 
by  iteratively  proceeding  from  a  complementary  basis  to  another  that  differs  in  the 
choice  of  one  column.  We  let  the  ith  digit  of  our  binary  n-vector  equal  zero  if  the 
ith  column  of  B  is  taken  from  the  identity  matrix,  and  one  if  it  is  taken  from  —M. 
(This  notation  is  unambiguous  because  the  tth  column  of  a  matrix  with  positive 
principal  minors  cannot  be  the  tth  column  of  —  I.)  Then  each  choice  of  feasible 
basis  corresponds  to  a  vertex  of  the  n-cube,  and  the  pivoting  algorithms  proceed 
from  one  such  vertex  to  an  adjacent  one.  Thus  the  BAT  model  has  features  in 
common  with  principal  pivoting  methods  for  the  linear  complementarity  problem. 
It  is  well  known  that  such  pivoting  algorithms  require  O(n)  iterations  in  practice, 
though  there  are  exponentially  bad  classes  of  instances  (see  Cottle,  1978).  This 
accords  exactly  with  the  simulation  predictions  of  expected  BAT  depth  and  with 
theoretical  worst  case  BATs. 


5.2  The  Simplex  Method 


The  Simplex  Method  solves  the  Linear  Programming  problem  (LP)  by  passing 
from  vertex  to  vertex  on  the  polytope  defined  by  the  constraining  hyperplanes. 
Suppose  the  number  of  variables  to  be  k  and  the  number  of  linear  inequality 
constraints  to  be  n,  where  n  >  k.  Then  there  are  at  most  (£)  basic  solutions,  each 
defined  by  the  intersection  of  k  hyperplanes.  Each  such  point  can  be  represented 
by  an  n  digit  binary  vector  with  exactly  k  ones  in  it,  to  indicate  which  of  the 
n  hyperplanes  define  that  particular  point  (or,  equivalently,  which  variables  are 
non-basic).  If  the  point  exists  and  is  feasible,  we  assign  it  a  function  value  equal 
to  the  objective  function  value;  If  some  constraints  are  violated,  we  subtract  the 
appropriate  multiple  of  M,  where  M  is  some  suitably  large  value,  from  the  objective 
function  value  to  get  the  function  value.  The  Simplex  Algorithm  always  proceeds 
by  taking  one  column  (hyperplane)  out  and  putting  one  column  into  the  basis  set; 
so  two  of  the  binary  vectors  can  be  thought  of  as  adjacent  if  they  differ  in  two 
components.  Thus,  by  restricting  ourselves  to  just  those  vertices  with  k  ones,  and 
changing  the  definition  of  adjacency  accordingly,  we  can  apply  the  OAT  and  BAT 
models  to  the  Simplex  Method  for  linear  programming.  We  call  the  space  of  n-tuples 
with  k  ones  n,k  space,  and  its  elements,  n,k-vectors.  We  say  that  two  n,  A;- vectors 
are  adjacent  if  they  differ  in  exactly  two  components.  In  the  following  it  is  assumed 
that  k  <  n/2  since  the  problem  is  the  same  for  vectors  with  k  ones  as  it  is  for 
vectors  with  n  —  k  ones. 

Simulation  results,  as  before,  give  a  formula  for  expected  tree  depth  that  is 
approximately  equal  to  the  logarithm  of  the  total  number  of  points,  (£),  in  the 
space.  For  OATs  with  k/n  =  the  average  mean  pathlength  is  roughly  equal  to 
.5 n  —  .2,  while  for  BATs  it  is  approximately  .75 n  —  1. 

To  get  theoretical  bounds,  we  need  a  lower  bound  on  the  size  of  the  coboundary 
or  boundary  of  a  subset  of  the  n,  A:- vectors.  Unfortunately,  no  such  bound  is  known. 
However,  Kleitman  and  West  have  conjectured  what  the  minimal  coboundary  sub¬ 
sets  are: 
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Conjecture  5.1  (Kleitman,  1979)  How  can  we  choose  a  k-hypergraph  H  on  n 
vertices  having  d  edges  to  minimize  the  coboundary?  The  conjecture  is  that  there  is 
an  optimal  H  for  any  d  so  that  either  H  or  its  complement  as  a  k-hypergraph  or  the 
complements  of  the  edges  of  H  or  the  complements  of  these  as  an  (n  —  k)-hypergraph 
do  not  contain  any  edges  containing  the  n-th  vertex. 

This  somewhat  confusing  statement  gives  a  complete  answer  to  the  problem 
if  applied  inductively.  We  clarify  it  and  describe  the  break  points  that  allow  the 
inductive  computation  in  the  next  section.  Our  goal,  as  before,  is  to  derive  numerical 
bounds  on  the  size  of  the  coboundary  for  different  ranges  of  values  of  d.  These 
bounds  are  stated  below: 

Lemma  5.10  Let  Mcob(d)nik  denote  the  size  of  the  smallest  possible  coboundary 
of  a  subset  of  cardinality  d  in  n,k  space  (according  to  the  conjecture).  Suppose 
d  <  !(£)  a°d  k  <  n/2.  Let  p  be  the  largest  integer  such  that  k  <  ^(n  —  p  +  1),  so 
that  p  =  n  —  2fc  +  1.  Then 

Case  1:  If  (£{)  <  d  <  i(”)  then  Mcob[d)n>k  >  k(%1). 

Case  2:If  (£l{)  >  d  >  then  there  is  some  i,  1  <  t  <  p  —  1,  such  that 

<  <*  <  (;;;).  >"<1 

MccKd).,k  >  “ ’)  +  - 1)(" 

Case  3:  If  (\ l~2)  <d<  (Jlf),  then 

Mcob{d)nik  >  2{kp  +  2 k-  2)(n  ~P_~  2). 

Case  4:  Otherwise ,  there  exists  an  integer  m  such  that  2  <  m  <  k  —  1,  and 

(2k  —  2m  —  A  (2k  —  2m  +  A 

k-m  )' 
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Then 


Mcob[d)n>k  >  (m  —  2)(2 k  —  m  -f  2)  +  (5k  —  5m  +  4  + 


2m-  l\ 
m  —  l  y 


We  defer  the  proof  of  Lemma  5.10  until  the  next  section  because  of  its  length. 
In  the  meanwhile  we  can  now  calculate  an  upper  bound  on  the  expected  mean  path- 
length  of  a  BAT  from  the  coboundary  distribution  in  n,  k  space.  Since  Mco6(d)n  fc 
equals  Mcob(d')n>ic  where  d'  =  (£)  —  d,  we  do  the  computation  for  the  range  1  < 
d  <  g(£)  and  multiply  the  result  by  two.  We  also  of  course  multiply  by  ek(n  —  k) 
because  each  point  has  k(n  —  k)  neighbors.  Case  1  of  Lemma  5.10  yields 

id)  -  ft:i)  _  In -2k 

*(“-')  2k'  n-k  ’’ 


In  case  2  of  Lemma  5.10,  i  ranges  from  1  to  p  —  1.  The  size  of  the  range  of  d 
‘s  (fc-i)  ~  (  fc— 2 *)  ~  (  fc— 2 Afcoh (d)ntk  is  at  least 


<v)+*e-')(n;:il Mv>+ 


n  —  i 


Therefore,  case  2  yields 


n-t  / 


The  ith  term  in  this  sum  equals 


fc  —  1 

(n  —  fc  —  t  +  t'fc)(n  —  t  —  fc  +  1) 


-(t  ‘{(.fcKn-t-i)) 


fc-  1  .  1  1 

fc(n  —  fc)  n  —  fc  —  t  t 


The  sum  for  case  2  is  therefore  less  than 


fc-1 
fc(n  -  fc) 


log(|n  — )• 
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Case  3  yields 


[(i:r)-(n*-22)]/2“p+2*-2,(n*-22)  *  sp=W 

Similarly,  case  4  yields  a  summation  with  numerator 

(2k  —  2m  +  l\  (2k  —  2m  —  A  (2k  —  2m\  3k  —  3m  +  1 

\  k  —  m  )  \  k  —  m  —  i  )  \  fc  — m  y2(k  —  m  +  1)' 

If  we  discard  the  (m  —  2)(2k  —  m  +  2)  term  that  appears  in  the  denominator,  the 
summation  term  increases,  so  it  is  less  than 

(2k  —  2m\  3k  -  3m  +  1 .  /.  , ,  (2k  —  2m  —  l\ 

(  *_m  (5* - 5m  +  4  +  Mp - 1))(  k_m_l  j. 

which  turns  out  to  be  less  than 

_ 3 _ 

5(fc  —  m)  +  4  +  k(p  —  1). 

Changing  the  index  of  summation  and  using  the  fact  that  p  —  1  =  n  —  2k,  case  4 
yields 

y?  3  3  5fc  — 10  3 

5z  +  4  +  k[n  -  2k)  5  °g^  +  k(n  -  2k)  n  -  2k' 

Let  r  =  k/n  <  1  and  let  n  — ►  oo.  Adding  the  contributions  of  the  four  cases 
and  multiplying  by  2ek(n  —  k)  —  2er(l  —  r)n2,  we  get 

Theorem  5.2  2ernlog(n(l  —  r)(l  —  2 r)/r)  +  6enr(l  -  ’-)/(l  —  2r) 

is  an  0(n  log  n)  bound  on  expected  BAT  height  under  the  coboundary  distribution. 

Corollary.  Let  r  =  k/n  <  ^  be  fixed,  and  suppose  the  minimal  coboundary 
conjecture  holds.  Then  under  the  coboundary  distribution,  the  expected  number 
of  iterations  required  to  solve  LP  by  a  simplex-type  better  adjacency  algorithm,  as 
n  — ►  oo,  is  less  than 

2ernlog(n(l  —  r)(l  —  2r)/r)  +  6enr(l  —  r)/(l  —  2r)  «  2efclogn. 
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5.3  Proof  of  Lemma  5.10. 

We  prove  lemma  5.10  in  several  parts.  Recall  that  Mcob(d)nik  is  defined  as  the 
minimum  coboundary  size  of  a  subset  of  cardinality  d  in  n,  k  space.  We  define  the 
1-part  (respectively  0-part)  of  a  set  of  n,  /e-vectors  to  be  the  subset  of  vectors  with 
nth  coordinate  equal  to  one  (respectively  zero).  Thus  (U)  is  the  size  of  the  1-part 
of  n,  k  space.  If  k  <  n/2  and  d  <  g(£)  then  the  conjecture  says  that  if  d  is  smaller 
than  the  1-part  of  n,  k  space,  the  optimal  hypergraph  is  in  the  0-part  of  n,  k  space, 
otherwise  it  contains  the  entire  1-part  of  n,  k  space  (Kleitman,  1979).  Let  //(a')n>k 
denote  the  optimal  choice  of  n,  k- vectors  specified  by  the  conjecture. 

Lemma  5.3  Suppose  k  <  n/2;  then  (£“})  <  |(£).  If  (*!})  <  d  <  ^(£)  then 

Mcob(d)nik  >  Mcob ((£l  j))n,fc. 


Proof.  If  we  let  Ad  equal  d~  then  H{d)n>k  includes  the  1-part  of 

n,  k  space  and  Ad  elements  from  the  0-part.  We  compute  the  number  of  directed 
edges  between  these  Ad  elements  and  the  1-part:  each  member  of  the  0-part  has  k 
neighbors  in  the  1-part,  so  the  number  is  2kAd.  When  we  add  a  set  of  Ad  elements 
to  the  1-part,  we  bring  in  Ad(k)(n  —  k)  new  edges;  what  we  must  show  is  that  the 
number  of  new  interconnections  between  members  of  the  set  is  not  more  than  the 
number  of  new  edges  brought  in,  so  that  the  new  coboundary  is  not  less  than  the 
coboundary  of  the  1-part.  We  claim  that  kAd[n  —  k  —  2)  is  greater  than  or  equal 
to  the  number  of  interconnections  (directed  edges)  within  any  set  of  Ad  elements 
of  the  0-part  of  n,  k  space.  Obviously,  a  proof  of  this  claim  will  prove  lemma  5.3. 

Proof  of  Claim:  The  number  of  interconnections  is  maximized  when  the  coboun¬ 
dary  is  minimized.  So  the  question  is,  what  does  H{Ad)n^\<k  look  like?  We  note 

th»t  Ai  <  i(;)  -  (;:!)  -  j[(V)  -  (;:!)]. «  Ad  <  j(V).  «<t  »  if  *  <  ^ 

we  can  use  lemma  5.3  inductively.  This  is  what  we  do  in  case  (i). 

Case  (i):  If  Ad  >  (£~*)  and  k  <  then  inductively  we  know  that 
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Mco6(Ad)n_ltfc  >  Mcob{ 

It  is  easy  to  see  that  the  number  of  interconnections  in  # ((£lf))n-i,*  equals 
(fc-i)(k  ~  l)(n  —  k  —  1).  It  is  also  evident  that  Mcob(a)n-itk  +  [the  number  of 
interconnections  in  if(a)n_1(k]  =  |ajfc(n  —  k  —  1),  which  implies  that 

kAd(n  —  k  —  1)  —  [i number  of  interconnections  in  f/(Ad)n_itk] 

>  k ^(n  —  k  —  1)  —  [interconnections  in  j^)»-i,fc] 

=>  [interconnections  in  /f(Ad)n_itfc] 

“  (fc  -  l)^  ~  1^”  “  k  ~  ~  k  ~  l^Ad  “  (&  _  l)^ 

=  (»-*-l)(*Ad-^”J)) 

=  (n  —  k  —  2)kAd  +  kAd  —  (n  —  k  —  l)f  ™ 

To  prove  the  claim  in  this  case,  we  must  show  that  the  right  hand  side  of  the  above 
is  no  more  than  kAd(n  ~k  —  2).  Equivalently,  we  must  show  that  (kZj)(«  —  k  —  1)  > 
kAd.  Now 


“MlfrKUM";’) 


and 


f  n  —  2\n  —  k  —  l  fn  —  2\ 

[k-ij  k  *  y 

so  we  are  done  with  case  (i). 


Case  (ii):  Suppose  k  <  n/2  and  Ld  <  (£Z?)-  Then  k  —  1  <  so  the 
conjecture  says  that  H(Ad)n-i,k  is  in  the  0-part  of  n  —  1,  Ac  space.  In  this  space, 
the  total  number  of  neighbors  a  point  has  is  fc(n  —  k  —  2),  so  the  total  number  of 
interconnections  in  H(Ad)n~itk  can’t  be  more  than  (n  —  k  —  2)kAd,  which  is  the 
desired  result. 


Case  (iii):  The  only  case  left  occurs  when  Ad  >  (*!*)  and  k  =  n/2.  However, 
since  Ad  <  5[(nk ')  -  (*!})]  and  C^1)  =  (£Zj)  when  k  —  n/2,  Ad  must  be  <  0 
so  this  case  is  vacuous.  Therefore,  the  claim  and  lemma  5.3  are  proved.  | 
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Remark. 

Meot<(*  -  x)"'* = {n -  *€  -  0  ■  *(n  *  ')■ 

Lemma  5.4 

Ifk<^  and 

1C  ;■)«=(;::) 

then 

Mcob(d)n,k  >  k(n  ~ 

Proof.  Let  d!  —  {nkl)  —  d.  Then  Mcob(d,)n-.\jk  =  Mcob{d)n-\<k.  Since 
H(d)nik  is  in  the  O-part  of  n,k  space,  Mcob(d)nik  —  kd+  Mco6(d)n_itfc  and  so  we 
need  to  know  what  //(d')n_l>fc  looks  like.  Note  that  (£”*)  <  so  either 

(nO^KV) 

or 

-(1=0- 

In  the  first  case,  since  k  <  Lemma  5.3  applies  and 

Mcob[d')n-i<k  >  Afco6(^”  _  =  k^1  2^. 

Therefore, 

M^„,>U  +  t("-)>‘(n-)  +  ("-). 

So  we  need  to  show  that  ^("fc2)  >  w^ich  turns  out  to  be  equivalent  to 

^  >  k. 
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In  the  second  case,  Mcob{d)n<k  =  led  +  Mcob(d')n-lik  and,  according  to  the 
conjecture,  H(d!)n-itk  is  in  the  0-part  of  n-  l,k  space,  so  Mcob(d')n-i>k  >  kd' . 
But  then 

Mcob(d)ntk  =  kd  +  Meob(d!)n~ 1,*  >  kd  +  kd'  =  k 
and  Lemma  5.4  is  proved.  | 

Lemma  5.5  If  k  <  and 

(1:9*- *KV) 

then 

Mcob[d)n>k  >  k k 


Proof.  By  Lemma  5.3,  Mcob(d)n-iik  >  k(n k2}.  But  since  d  <  (£_}),  it  follows 
that  Mcob{d)n,k  —  kd  +  Mcob(d)n-ltk  and  so  the  same  argument  for  the  first  case 
of  Lemma  5.4  works.  | 


Lemma  5.6.0  Suppose 

("-It1)  <  -  <  (:zo  < <  (::o  ^  (:z0 

and  i  is  such  that  k  <  .  Then 

Mao4Mn,»  >  *[("  k  ’)  +  (•'  -  >)(”  k  1 ,  *)]• 

Proof.  By  Lemmas  5.4  and  5.5, 

Mcob(d)n-i+i,k  > 
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Since  d  <  (k_i),  H(d)n-i+i,k  is  in  the  O-part  of  n  —  t  +  1,  k  space;  in  fact,  i/(d)n>k 
is  in  the  0-part  of  the  0-part  of  . . .  the  0-part  of  n,  k  space.  Therefore, 


Mcob(d)nik  =  Mcob(d)n-i+ i>k  +  kd(i  -  1). 

Combining  this  with  the  above  gives 

Mcob(d)nk  >  k(^  ~  +  kd{i  -  1)  >  k(n  “  *)  +  k(i  -  l)^n  ~  ~ 

which  is  the  desired  result.  | 

The  reasoning  used  in  the  previous  proof  will  be  needed  later;  so  we  state  it  as 
a  lemma. 

Lemma  5.6.1.  Suppose  d  <  (X—  i)  where  k  <  ”  1  and  suppose 

Mcob(d)n~{.  lik  >  f 

for  some  number  f.  Then 


Mcob{d)n^k  >  /  +  kd[i  -  1). 

Proof.  Since  k  <  and  d  <  we  know  that  i/(d)n_,+l  k  is  in  the 

O-part  of  n  i  +  1,  ft  space.  Similarly,  since  k  <  ^±2  and  d  <  ("-It1),  we 
know  that  H(d)n—i+2ik  is  in  the  O-part  of  n  —  i  +  2,  k  space,  and  so  on  -  clearly  the 
last  t  —  1  coordinates  of  H(d)nik  equal  zero.  But  any  n,  k-ve ctor  whose  last  i  —  1 
coordinates  equal  zero  has  precisely  k(i-l)  neighbors  whose  last  i  -  1  coordinates 
are  not  all  zeroes.  Therefore, 

dk(i  -  1)  +  Mco&(d)n_<+l  fc  =  Mcob(d)n>k 
and  the  result  follows  immediately.  | 

Lemma  5.7.0.  Suppose  k  =  n/2  and 
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Then 


^cob[d)n,k>kd  +  (k-  l)(fc_i)- 

Proof.  Since 


it  follows  that  H(d)n>k  is  in  the  0-part  of  n,  k  space.  Therefore, 

Mcob(d)n,k  —  Mcob(d)n-i,k  +  hd. 

Also,  since  k  +  (fc  —  1)  =  n  —  1, 

Mcob(d)n-\,k  —  Mcob[d)n^  i,*— 1 . 


Let  d'  =  (”l|)  -  d.  Then 

Mcob(d]  n-l,fc_l  =  Afco6(d/)n-l,k-l- 

Since  A:  —  1  <  -w~^~l ,  Lemmas  5.4  and  5.5  apply  to  various  ranges  of  d  and  d! 
giving 

Mco6(d)n-i,k-i  =  Mco6(d')»-i,*-i  >  ~ 

for 

Mcob{d)n>k  >{k-  1)(”  ~  ^  4-  kd, 

the  desired  result.  | 

Remark.  Under  the  conditions  of  Lemma  5.7.0, 

-  »(* : !) + w  ^  -  *>(*  -  0 +  (* :  0 ■ l3k  -  2)(*  -  0- 
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Lemma  5.7.1.  If  k  =  n/2  and 


j 


then 

Mcob[d)n<k  >(k  -  1)(”  _  J)  +  >  *(*  _  J)  -  (l  _  2} 

Proof.  Using  terminology  from  lemma  5.7.0,  d'  <  (^2)1  so  H [<&)*— 1  ,fc-i  *s  *n 
the  0-part  of  n  —  1,  fc  —  1  space.  Therefore, 

Mcob{d!)n-  1(*_i  >  d\k  -  1) 

Mcob(d)ntk  >  <t{k  -  1)  +  dk  =  (A:  -  1)^  _  ^  +  d, 
the  desired  result.  | 

Remark.  Under  the  conditions  of  Lemma  5.7.1, 

Lemma  5.7.2.  If  k  =  n/2  and  (Jlj)  <  d  <  (*Z})  then 

M Cob[<£)n,k  >  (3fc  -  2)(”  1 2)  =  (f  *  -  X)(fc  1 1} 

Proof.  This  is  just  Lemmas  5.7.0,  5.7.1,  and  the  Remarks.  | 

Combining  Lemmas  5.7.2  and  5.6.1  gives 
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•S»"- 


.  -*»E-  A 


Lemma  5.8.  Suppose  d  <  and  k  <  n/2.  Let  p  be  such  that  k  =  n  ^l  .  If 

C*;:;2)  <.<(;-) 

then 

Mcob(d)n,k  >  dk(p  - 1)  +  (34  -  2>r  "  *_  ~  2) 

>{n-kr2y+u-»- 

The  only  remaining  case  of  Lemma  5.10  is  d  <  (“*£ g2)-  If  M  cob(d)n-v+\tk  > 
/  then  Afco6(<f)ni*  >  /  +  kd(p  —  1),  so  the  problem  is  to  find  a  bound  on 
Mco6(d)n_ p+i,k  when  d  <  (n**2  2)’  us  express  this  in  different  terms.  We 
want  a  bound  on  Mcob(d)2k,k  when  d  <  =  2 Ck-i) •  Whnt  does  f/(d)2k,k 

look  like?  Clearly  it  is  in  the  0-part  of  2k,  k  space,  so 

Mco6(d)2k,k  =  kd  +  Mcob[d)2k~i,k- 


Also, 


Mco6(d)2k-i,fc  =  Mcob(d)2k-i,k-i • 

The  problem  now  is  to  find  a  bound  on  Mcob(d)2k-i,k-i  where  d  <  (2fcfcJ723).  Again, 
^(d)2k-i,k-i  is  in  the  0-part  of  2fc  —  l,k  —  1  space,  so 

Mcob{d)2k-i,k-i  =[k—  l)d  +  Mco6(tQ2k-2,k-i- 


What  is  Mcob{d)2k-2,k-\f  Recall  that 


and  k  —  1  =  so  applying  Lemma  5.7.2,  when 


■w.-a.i-; 
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we  find  that 


Mcob(d)2k_2tk >  (3fc-5)^_35^ 


Now 


Mcob(d)2k-2,k-i  +  (fc  —  l)d  =  Mcob(d)2k-i,k 


and 


Mcob(d)2k~i,k  +  kd  —  Mcob(d)2k,k 

hence 

Mcob(d)2k-2,k-i  +  (2fc  —  l)d  =  M  cob(d)2k,k- 
We  have  shown  that  in  the  case  (2£r35)  <  d  <  (2fc~2), 

Mcob{d)2k,k  >  {2k  -  l)d  +  (3  k  -  5)^”^. 

In  the  other  case,  d  <  (2fc_7^),  let  k'  =  k  —  i.  we  want  a  bound  on  Mco6(d)2fc»,*» 
when 


Therefore,  repeated  use  of  the  preceding  will  solve  the  problem.  We  have  proved 
Lemma  5.9.0  Suppose  d  <  {2kZ2)  •  If  d  >  {2kZ2)  then 

Mcob(d)2k,k  >  (2 k  -  l)d  +  (3 k  -  5 )(2fc  r35). 

Otherwise, 

Meob{d)2k,k  =  Mcob{d)2k-2,k-\  +  (2  k  —  1  )d. 
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To  get  d  in  ranges,  we  repeatedly  ap^ly  Lemma  5.9.0,  and  add  a  k(p  —  l)d  term 
from  Lemma  5.6.1,  yielding: 

Lemma  5.9.1.  Suppose  d  <  (2fcfer23).  Then  for  some  m  >  2, 

(2k  -  2m  -  A  ^  .  (2k  -  2m  +  1\ 

U-m-l  *-">  ) 

and 


Mcob(d)nik  >  kd(p  -  1)  +  2  2(*  -  *)  +  1  +  (5k  -  5m  + 

{2k 2  ffi A 

>  (m  —  2)(2 k  -  m  +  2)  +  (4fc  —  5m  4-  4  +  fcp)f  ^  ^  )• 


This  completes  the  last  case  of  Lemma  5.10.  | 
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CHAPTER  6. 

PROBLEMS  WITH  MULTIPLE  LOCAL  OPTIMA 


6.1  All  Orderings  Equally  Likely 

What  happens  when  the  problem  is  not  local-global?  For  instance,  what 
happens  in  general  with  LCP,  or  with  the  clique  problem?  In  this  case,  a  local 
improvement  algorithm  is  not  guaranteed  to  find  a  solution;  recall  that  a  forest 
of  trees  may  be  generated  instead  of  a  single  tree.  There  are  algorithms  for  some 
“hard”  combinatorial  problems,  such  as  0-1  integer  programming  or  the  travelling 
salesman  problem,  which  make  use  of  local  improvement,  and  are  justified  because 
there  is  no  known  way  to  solve  them  exactly  in  a  reasonable  amount  of  time. 
Many  artificial  intelligence  applications  employ  hill  climbing,  although  the  problems 
often  turn  out  not  to  be  local-global  (Nilsson,  1981,  Winston,  1977).  The  obvious 
questions  to  ask  are,  “what  are  the  chances  of  a  local  improvement  algorithm 
working?”,  and  “how  long  will  such  an  algorithm  take?”.  These  are  equivalent  to 
the  questions,  “how  many  trees  are  in  the  forest?” ,  and  “how  high  are  the  trees?” . 


Proposition  6.1  Under  the  assumption  that  all  orderings  are  equally  likely  the 
expected  number  of  trees  in  the  OAF  is  equal  to  (2 ”)/(n  +  1). 

Proof.  Let  i  denote  a  vertex  of  the  n-cube.  For  x  =  0  to  2n  —  1,  let  the  random 
variable  Iz  equal  one  if  x  is  a  local  optimum  and  zero  otherwise.  Then  the  expected 
number  of  local  optima  equals 


(2"_ i  \  2"— 1 

£ J-  =  £  • 
x=0  J  i=0 


(6.1.1) 


The  probability  of  x  being  a  local  optimum  is  the  probability  that  it  is  the  highest 
of  n  + 1  vertices  in  the  ordering  (it  and  its  n  neighbors).  If  all  orderings  are  equally 
likely,  this  probability  is  l/(n  +  1).  Thus 


£(/*)  =  l/(n  -1- 1);  x  =  0, . . . ,  2n  —  1. 


(6.1.2) 


Combining  equations  (6.1.1)  with  (6.1.2)  yields  the  desired  result.  | 

For  problems  with  all  orderings  equally  likely,  then,  a  local  improvement  algo¬ 
rithm  by  itself  has  little  chance  of  attaining  a  global  optimum.  This  is  true  even 
for  parallel  processing  versions  that  use  multiple  starting  points  (unless  there  are 
exponentially  many).  In  view  of  the  results  for  LG  problems  we  might  expect  the 
average  pathlength  of  nodes  in  the  OAFs  to  be  small.  This  is  indeed  the  case: 

Simulation  Result  6.2  Under  the  assumption,  a 11  orderings  equally  likely,  the 
average  mean  pathlength  is  linear  in  n. 

Proposition  6.3  Under  the  assumption,  all  orderings  equally  likely,  the  average 
mean  pathlength  is  less  than  \en2 . 

Proof.  This  result  follows  easily  from  a  computation  similar  to  the  ones  in  the 
proofs  in  the  previous  chapters. 

6.2  Boundary  Uniform  Distributions 

The  assumption  that  all  orderings  are  equally  likely  is  appealing  because  it 
is  easily  stated.  However,  it  may  not  be  realistic.  In  particular,  it  fails  to  take 
into  account  correlations  between  function  values  of  neighboring  points.  This  is 
evidenced  by  the  conflict  between  the  exponentially  many  local  optima  in  Proposition 
6.1  and  the  high  probability  of  there  being  one  local  optimum  (as  discussed  in  2.3) 
when  there  is  positive  correlation.  The  boundary  and  coboundary  distributions 
naturally  incorporate  some  positive  correlation  between  function  values  of  adjacent 
points,  but  are  of  course  not  suitable  for  producing  non-LG  problems. 

We  now  define  two  classes  of  distributions  on  non-LG  problems  which  are  ex¬ 
tensions  of  the  boundary  and  coboundary  distributions.  A  probability  distribution 
on  orderings  is  said  to  be  boundary  uniform  if  all  members  of  the  boundary  set  of 
the  first  t  vertices  in  the  ordering  have  an  equal  probability  of  being  the  %  +  1st  in  the 
ordering.  Similarly,  a  distribution  is  said  to  be  coboundary  uniform  if  the  relative 
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chances  of  boundary  members  are  weighted  according  to  the  number  of  neighbors 
they  have  among  the  first  i  points  in  the  ordering.  There  are  no  restrictions  on  ver¬ 
tices  not  in  the  boundary:  their  individual  probabilities  can  differ  widely,  and  the 
overall  probability  that  a  non-boundary  member  is  chosen  (and  hence  that  another 
local  optimum  is  introduced)  can  vary  depending  on  what  the  first  t  vertices  are. 
Since  the  pathlength  of  a  starting  vertex  which  is  a  local  optimum  is  one,  we  can  only 
decrease  average  mean  pathlength  by  allowing  additional  local  optima.  Therefore, 
the  bounds  calculated  earlier  for  the  boundary  (respectively  coboundary)  distribu¬ 
tion  extend  to  the  class  of  boundary  uniform  (respectively  coboundary  uniform) 
distributions.  We  state  this  result  in  the  following  theorem. 

Theorem  6.4  The  expected  mean  pathlength  of  an  Optimal  Adjacency  Forest 
(OAF)  or  a  Better  Adjacency  Forest  (BAF),  under  any  boundary  uniform  distribu¬ 
tion,  is  less  than  en2.  The  expected  mean  pathlength  of  a  BAF  from  any  coboun¬ 
dary  uniform  distribution  is  less  than  2enlogn.  The  expected  mean  pathlength  of 
an  OAF  from  any  coboundary  uniform  distribution  is  less  than  2en2logn. 


6.3  The  Local-Global  Property  and  /VP-complete  Problems 

There  appears  to  be  a  considerable  difference  between  problems  that  are  LG 
and  those  that  are  not.  In  particular,  it  seems  that  the  well  known  iVP-complete 
problems  belong  to  the  latter  group.  For  instance,  we  have  the  following  proposition: 

Proposition  6.5  In  the  travelling  salesman  problem,  if  two  hamiltonian  circuits 
diffet  only  in  the  order  in  which  two  consecutive  cities  are  visited,  they  are  called 
adjacent.  Then  with  this  notion  of  adjacency,  there  exists  a  class  of  instances  with 
exponentially  many  local  optima  that  are  not  global  optima. 

Proof.  As  the  basis  for  our  class  of  instances,  we  use  a  graph  with  six  nodes 
labelled  a,  a',  b,c,d,  and  e.  The  nodes  a,  b,  c,  and  d  form  a  rectangle  with  lengths 
ab  =  cd  =  24,  ad  —  be  —  10,  and  ac  —  bd  =  26.  Node  e  is  located  midway 
between  the  short  sides  and  a  little  closer  to  side  cd  than  to  side  a6,  thus  ed  =  ec  — 
12.5  and  ea  =  eb  =  14.5.  The  node  a!  is  at  some  very  small  distance  to  node  a, 
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so  its  distances  to  other  nodes  are  the  same  as  for  a.  We  remark  that  the  circuit 
a,  d,  c,  6,  e,  a',  a  is  a  local  optimum  but  is  not  globally  optimal  since  its  cost  is  three 
more  than  the  circuit  a,d,e,c,b,a' ,a.  The  latter  circuit  is  globally  and  of  course 
locally  optimal.  Now  construct  n  copies  of  this  graph,  setting  all  distances  between 
nodes  in  different  copies  to  100,  except  that  the  a  and  a'  nodes  are  at  a  distance 
of  20.  Any  circuit  that  starts  at  some  o,  goes  around  that  copy  with  either  of  the 
two  locally  optimal  circuits  discussed  above  (leaving  out  o,  a'),  proceeds  to  another 
copy  and  goes  around  it  with  one  of  the  two  locally  optimal  circuits,  etc.,  will  be  a 
local  optimum  in  the  n-copy  graph.  However,  only  one  of  these  2n  circuits  will  be 
globally  optimal  (the  one  that  always  used  the  second  choice).  We  have  constructed 
a  graph  with  6n  nodes  which  has  at  least  2n  —  1  local  optima  that  are  not  global 
optima.  | 

We  can  further  sharpen  the  apparent  distinction  between  LG  and  /VP-complete 
problems  by  showing  that  LG  problems  are  essentially  in  NP  (~)  co(NP)  (and  hence 
unlikely  to  be  /VP-complete).  To  be  precise,  we  define  the  set  recognition  version 
of  the  optimization  problem 


max  f(x) 

z£X 

to  be  the  following  question:  Given  an  instance  and  a  number  k,  does  there  exist  an 
i  €  X  such  that  /( i)  is  at  least  k ? 

Proposition  6.6  Suppose  that,  for  some  discrete  optimization  problem, 

S3? 

there  exists  a  notion  of  adjacency  which  assigns  neighbors  to  each  point  in  such 
a  way  that  (1)  the  assignment  of  neighbors  is  independent  of  the  instance  of  the 
problem  (independent  of  the  particular  data),  and  (2)  each  vertex  has  polynomially 
many  neighbors.  Then  if  the  problem  is  LG,  its  set  recognition  version  is  in 
NPf\co(NP). 
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Proof.  If,  given  some  particular  data  and  number  k,  there  is  no  x  £  X  such 
that  /(z)  is  at  least  k,  this  fact  can  be  proved  in  nondeterministic  polynomial  time 
by  “guessing”  the  true  optimum,  showing  its  value  is  less  than  k,  and  verifying 
its  optimality  by  comparing  its  value  with  the  values  of  its  (polynomially  many) 
neighbors. 

Theorem  6.7  The  clique  problem  is  not  LG  under  any  data  independent,  polyno¬ 
mial  assignment  of  adjacency.  Also,  under  the  ordinary  notion  of  adjacency,  (two 
subsets  S  and  T  of  vertices  are  adjacent  if  one  is  a  subset  of  the  other  and  their 
cardinality  differs  by  one),  there  exists  a  class  of  instances  with  exponentially  many 
local  optima  that  are  not  global  optima. 

Proof.  We  play  the  adversary  against  an  arbitrary  fixed  adjacency  rule.  The 
instance  we  construct  will  have  n  nodes,  though  we  will  not  specify  what  size  n  is 
until  later.  Our  target  clique  consists  of  the  first  n/4  nodes.  It  will  be  locally  but 
not  globally  maximal.  We  connect  all  of  these  n/4  nodes  with  edges  so  they  form  a 
clique,  and  we  do  not  make  any  more  edges  incident  to  these  nodes.  Consider  the 
next  n/2  nodes:  there  are  (™^)  subsets  of  order  n/4  and  (j+„y4)  subsets  of  order 
(1  +  n/4).  By  assumption,  there  exists  a  polynomial  p(n)  which  bounds  the  number 
of  neighbors  a  subset  can  have.  We  choose  n  to  be  large  enough  that  np(n)  is  smaller 
than  (^4).  Then  there  must  be  a  subset  of  the  n/2  nodes  with  the  properties  that 
(i)  it  is  of  order  (1  +  ^)i  (ii) .  it  is  not  a  neighbor  of  the  target  clique,  and  (iii)  it 
contains  no  subset  of  order  n/4  that  is  a  neighbor  of  the  target  clique.  We  connect 
the  nodes  of  this  subset  so  as  to  make  it  a  clique;  all  pairs  of  nodes  not  in  the 
subset  and  not  in  the  target  clique  remain  unconnected.  The  subset  is  therefore 
the  global  maximum,  but  any  neighbor  of  the  target  clique  will  not  be  a  clique  or 
will  be  of  order  less  than  n/4.  Given  an  arbitrary  polynomial  adjacency  rule,  we 
have  constructed  an  instance  which  violates  the  LG  property.  This  proves  the  first 
statement  of  the  theorem. 

The  ordinary  notion  of  adjacency  states  that  two  subsets  of  the  nodes  of  the 
graph  are  adjacent  if  one  contains  the  other  and  their  cardinality  differs  by  one.  We 
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start  with  a  graph  whose  nodes  are  labelled  1,2,  ...,t . n,  and  which  is  itself  a 

clique.  That  is,  for  all  distinct  nodes  t  and  j,  the  edge  (t,  j)  is  in  the  graph.  Now  we 
delete  all  edges  of  form  (i,i-fl).  If  a  subset  of  the  nodes  contains  i,  it  cannot  contain 
either  t  +  1  or  i  —  1  and  still  be  a  clique.  We  can  build  up  cliques  (v  1 , ...» v*, ... ) 
by  choosing  v\  equal  to  either  node  1  or  2,  and  i>fc+i  equal  to  either  «*  +  2  or 
Vk  +  3.  Since  the  subsets  [k,k  +  l,k  +  3),  [k,  k  +  1,  k  -f  23),  and  [k,  k  +  2,  k  +  3)  do 
not  have  all  of  their  edges,  the  cliques  we  build  in  this  way  are  all  locally  optimal. 
There  are  more  than  2n/3  of  these  because  we  made  at  least  n/3  choices  when 
constructing  them.  Moreover,  most  of  them  (all  but  2n,  at  least)  are  of  order  less 
than  n/2,  the  maximal  order  achieved  by  the  clique  (1, 3,  5, 7, . . . ).  We  therefore 
have  exponentially  many  local  optima  that  fail  to  be  global  optima.  | 

Note:  Many  known  NP-complete  problems,  such  as  knapsack  or  three  dimen¬ 
sional  matching,  originated  in  the  form  of  optimization  problems,  so  the  idea  of 
local  optimality  applies  immediately.  Some  other  problems,  such  as  satisfiability, 
3-colorability,  or  2-partition,  are  ordinarily  set  recognition  version  (that  is,  yes/no) 
problems,  so  the  concept  of  local  optimality  may  not  seem  to  apply.  However,  we 
have  found  that  most  such  problems  can  be  easily  transformed  into  an  optimization 
version.  For  example,  the  Boolean  satisfiability  problem  becomes  the  problem  of 
assigning  Boolean  values  to  the  set  of  variables  so  as  to  maximize  the  total  number 
of  clauses  that  are  true.  The  3-colorability  problem  becomes  the  problem  of  assign¬ 
ing  one  of  three  colors  to  each  node  in  the  graph  so  as  to  minimize  the  number 
of  pairs  of  nodes  that  have  the  same  color  and  are  connected  by  an  edge.  With 
2-partition  we  try  to  minimize  the  difference  between  the  sums  of  the  two  subsets; 
with  subgraph  isomorphism  (given  two  graphs  G  and  H ,  does  H  contain  a  subgraph 
isomorphic  to  G?)  we  try  to  find  a  mapping  from  the  nodes  of  G  into  the  nodes  of 
H  that  minimizes  the  number  of  conflicts  in  the  corresponding  edge  sets. 

Using  this  notion  of  optimization  versions  of  iVP-completc  problems,  we  can 
now  discuss  local  and  global  optimality.  We  believe  that  all  NP-complete  problems 
have  the  property  of  exponentially  many  local  optima.  However,  since  this  statement 
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implies  that  P  NP,  a  proof  will  not  be  attempted.  (If  P  =  NP,  then  any  LG 
problem  in  P  such  as  linear  programming  would  be  iVP- complete).  We  do  remark, 
however,  that  many  of  the  polynomial  transformations  used  in  NP- completeness 
results  preserve  the  “topology”  of  adjacencies  in  Buch  a  way  that  local  optima  remain 
local  optima.  It  is  usually  easy  to  show  that  some  particular  NP-  complete  problem 
is  not  LG. 

The  implications  of  the  above  results  are  that  local  improvement  algorithms 
will  work  quickly  but  have  little  chance  of  finding  the  true  optimum.  Even  with 
polynomially  many  different  starting  points,  chances  of  success  become  small  as  n 
gets  large.  This  assumes  that  the  starting  point  or  points  are  chosen  at  random;  a 
heuristic  that  finds  a  good  starting  point  or  points,  for  instance,  can  of  course  make 
quite  a  difference.  This  is  the  case  in  algorithms  for  solving  integer  programming 
problems,  where  local  improvement  is  an  inexpensive  way  to  improve  a  “good” 
solution  (Hillier,  1969). 
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